Detecting Multiword Expressions and Named Entities in Natural Language Texts

Nagy István
Detecting Multiword Expressions and Named Entities in Natural Language Texts.
[Thesis] (Unpublished)

[thumbnail of main.pdf]
PDF (disszertáció)
Download (1MB) | Preview
[thumbnail of thesis_en.pdf]
PDF (tézis)
Download (144kB) | Preview
[thumbnail of thesis_hu.pdf]
PDF (tézis)
Download (112kB) | Preview
[thumbnail of thesis_hu_szerzotars.pdf]
PDF (melléklet)
Download (3MB) | Preview

Abstract in foreign language

Multiword expressions (MWEs) are lexical items that can be decomposed into single words and display lexical, syntactic, semantic, pragmatic and/or statistical idiosyncrasy (Sag et al., 2002; Kim, 2008; Calzolari et al., 2002). The proper treatment of multiword expressions such as rock ’n’ roll and make a decision is essential for many natural language process- ing (NLP) applications like information extraction and retrieval, terminology extraction and machine translation, and it is important to identify multiword expressions in context. For example, in machine translation we must know that MWEs form one semantic unit, hence their parts should not be translated separately. For this, multiword expressions should be identified first in the text to be translated. The chief aim of this thesis is to develop machine learning-based approaches for the auto- matic detection of different types of multiword expressions in English and Hungarian natural language texts. In our investigations, we pay attention to the characteristics of different types of multiword expressions such as nominal compounds, multiword named entities and light verb constructions, and we apply novel methods to identify MWEs in raw texts. In the thesis it will be demonstrated that nominal compounds and multiword named enti- ties may require a similar approach for their automatic detection as they behave in the same way from a linguistic point of view. Furthermore, it will be shown that the automatic detec- tion of light verb constructions can be carried out using two effective machine learning-based approaches.

Item Type: Thesis (Doktori értekezés)
Creators: Nagy István
Magyar cím: Összetett kifejezések automatikus azonositása természetes nyelvū szövegekben
Divisions: Doctoral School of Computer Science
Tudományterület / tudományág: Engineering > Information Technology
Nyelv: English
Date: 2015. November 27.
Item ID: 2434
A mű MTMT azonosítója: 2758956
Date Deposited: 2014. Oct. 20. 11:00
Last Modified: 2020. Apr. 16. 09:44
Depository no.: B 5935
Defence/Citable status: Defended.

Actions (login required)

View Item View Item