Please use this identifier to cite or link to this item: http://dx.doi.org/10.18419/opus-11605
|Title:||Ensemble dependency parsing across languages : methodological perspectives|
|Abstract:||Human language is ambiguous. Such ambiguity occurs at the lexical as well as syntactic level. At the lexical level, the same word can represent different concepts and objects. At the syntactic level, one phrase or a sentence can have more than one interpretation. Language ambiguity is one of the biggest challenges of Natural Language Processing (NLP), i.e., the research field that sits at the intersection of machine learning and linguistics, and that deals with automatic processing of language data. This challenge arises when automatic NLP tools need to resolve ambiguities and select one possible interpretation of a text to approach understanding its meaning. This dissertation focuses on one of the essential Natural Language Processing tasks - dependency parsing. The task involves assigning a syntactic structure called a dependency tree to a given sentence. Parsing is usually one of the processing steps that helps downstream NLP tasks by resolving some of the syntactic ambiguities occurring in sentences. Since human language is highly ambiguous, deciding on the best syntactic structure for a given sentence is challenging. As a result, even state-of-the-art dependency parsers are far from being perfect. Ensemble methods allow for postponing the decision about the best interpretation until several single parsing models express their opinions. Such complementary views on the same problem show which parts of the sentence are the most ambiguous and require more attention. Ensemble parsers find a consensus among such single predictions, and as a result, provide robust and more trustworthy results. Ensemble parsing architectures are commonly regarded as solutions only for experts and overlooked in practical applications. Therefore, this dissertation aims to provide a deeper understanding of ensemble dependency parsers and answer practical questions that arise when designing such approaches. We investigate ensemble models from three core methodological perspectives: parsing time, availability of training resources, and the final accuracy of the system. We demonstrate that in applications where the complexity of the architecture is not a bottleneck, an integration of strong and diverse parsers is the most reliable approach. Such integration provides robust results regardless of the language and the domain of application. However, when the final accuracy of the system can be sacrificed, more efficient ensemble architectures become available. The decision on how to design them has to take into consideration the desired parsing time, the available training data, and the involved single predictors. The main goal of this thesis is to investigate ensemble parsers. However, to design an ensemble architecture for a particular application, it is crucial to understand the similarities and differences in the behavior of its components. Therefore, this dissertation makes contributions of two sorts: (1) we provide guidelines on practical applications of ensemble dependency parsers, but also (2) through the ensembles, we develop a deeper understanding of single parsing models. We primarily focus on differences between the traditional parsers and their recent successors, which use deep learning techniques.|
|Appears in Collections:||05 Fakultät Informatik, Elektrotechnik und Informationstechnik|
Files in This Item:
|falenska_dissertation_final.pdf||5,41 MB||Adobe PDF||View/Open|
Items in OPUS are protected by copyright, with all rights reserved, unless otherwise indicated.