Detecting Multiword Expressions and Named Entities in Natural Language Texts

Nagy István
Detecting Multiword Expressions and Named Entities in Natural Language Texts.
PhD, University of Szeged.
(2015)

[img]
Preview
PDF (disszertáció)
Download (1MB) | Preview
[img]
Preview
PDF (tézis)
Download (144kB) | Preview
[img]
Preview
PDF (tézis)
Download (112kB) | Preview
[img]
Preview
PDF (melléklet)
Download (3MB) | Preview

Abstract in foreign language

Multiword expressions (MWEs) are lexical items that can be decomposed into single words and display lexical, syntactic, semantic, pragmatic and/or statistical idiosyncrasy (Sag et al., 2002; Kim, 2008; Calzolari et al., 2002). The proper treatment of multiword expressions such as rock ’n’ roll and make a decision is essential for many natural language process- ing (NLP) applications like information extraction and retrieval, terminology extraction and machine translation, and it is important to identify multiword expressions in context. For example, in machine translation we must know that MWEs form one semantic unit, hence their parts should not be translated separately. For this, multiword expressions should be identified first in the text to be translated. The chief aim of this thesis is to develop machine learning-based approaches for the auto- matic detection of different types of multiword expressions in English and Hungarian natural language texts. In our investigations, we pay attention to the characteristics of different types of multiword expressions such as nominal compounds, multiword named entities and light verb constructions, and we apply novel methods to identify MWEs in raw texts. In the thesis it will be demonstrated that nominal compounds and multiword named enti- ties may require a similar approach for their automatic detection as they behave in the same way from a linguistic point of view. Furthermore, it will be shown that the automatic detec- tion of light verb constructions can be carried out using two effective machine learning-based approaches.

Item Type: Thesis (PhD)
Creators: Nagy István
Hungarian title label: Összetett kifejezések automatikus azonositása természetes nyelvū szövegekben
Title of the thesis in foreign language: Detecting Multiword Expressions and Named Entities in Natural Language Texts
Divisions: Doctoral School Informatics
Discipline label: műszaki tudományok > informatikai tudományok
Defence date label: 2015. November 27.
Item ID: 2434
Identification Number: 2758956
doi: https://doi.org/10.14232/phd.2434
Date Deposited: 2014. Oct. 20. 11:00
Last Modified: 2016. Feb. 19. 15:46
Depository no.: B 5935
Supervisor label:
Supervisor Supervisor scientific name label
Dr. Csirik János
Dr. Farkas Richárd
DSc, egyetemi tanár, SZTE TTIK Számítógépes Algoritmusok és Mesterséges Intelligencia Tanszék
PhD, adjunktus, SZTE TTIK Számítógépes Algoritmusok és Mesterséges Intelligencia Tanszék
Reviewer label:
Reviewer name label Reviewer scientific name label
Dr. Váradi Tamás
Dr. Varga Dániel
tudományos osztályvezető, PhD, MTA Nyelvtudományi Intézet
PhD, tudományos segédmunkatárs, BME Szociológia és Kommunikáció Tanszék
President label:
President name label President scientific name label
Dr. Gyimóthy Tibor
DSc, tanszékvezető egyetemi tanár, SZTE TTIK Szoftverfejlesztés Tanszék
Member label:
Member name label Member scientific name label
Dr. Alexin Zoltán
Dr. Bánhelyi Balázs
PhD, adjunktus, SZTE TTIK Szoftverfejlesztés Tanszék
PhD, adjunktus, SZTE TTIK Számítógépes Optimalizálás Tanszék
URI: http://doktori.bibl.u-szeged.hu/id/eprint/2434
Defence/Citable status: Defended.

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year