Syntactic dependencies and beyond : robust neural architectures and quality-enhanced corpora for structured prediction in NLP
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This thesis investigates explicit structure in Natural Language Processing (NLP). Such structure, represented by abstract linguistic objects like part-of-speech tags, syntax trees, or graph-based meaning representations, has traditionally played a central role in NLP. Historically linked to the idea of rule-based processing of human language (using tools such as formal grammars), it has also successfully been combined with statistical machine learning techniques. For practical applications, this has often taken the form of a pipeline in which the prediction of linguistic features serves as a first step towards addressing “higher-level” tasks such as text classification or information extraction. In addition, algorithmic extraction of linguistic structures is also being pursued for its own sake, i.e., as a means to deepen our understanding of human language.
Most recently, the field of NLP has been dominated by techniques leveraging artificial neural networks. In this paradigm, language data is not processed using a pipeline approach as outlined above, which is ultimately grounded in simple and interpretable features. Rather, neural networks learn internal, vector-based language representations by means of large-scale mathematical optimization based on (usually very large amounts of) raw input data, allowing for “end-to-end” language processing that does not involve any kind of explicit structure as an intermediate representation. While the successes of this paradigm are undeniable in practical terms - i.e., achieving new state-of-the-art results on a wide range of applications ranging from information extraction to machine translation -, it has also spurred controversial questions around the present and future role of explicit structure in NLP. At an overarching level, this means a general uncertainty about the role of explicit structure: Can NLP still benefit from modeling structure explicitly, or have such approaches become obsolete? Apart from this fundamental question, however, the interaction between neural networks and explicit structure in NLP also raises a number of practical challenges; and it is these challenges that form the core of this thesis and the basis for its contributions.
The first challenge relates to the role of data in training structure-prediction systems. As a general rule, neural networks require large amounts of (labeled) training data for learning specific tasks, and thus the curation and annotation of suitable datasets is a common bottleneck in their development. In our contributions, we focus on the Universal Dependencies (UD) formalism for the annotation of syntactic dependencies. Evaluating the quality of existing treebanks and examining ways of improving and extending them, one of our core findings it that both rule-based and machine learning-based methods can be leveraged to reduce the need for manual annotation.
The second challenge relates to the design and architecture of neural structure-predicting systems, of which there exists a wide variety; often, it is not fully clear which factors are truly important in achieving the best possible performance. We study dependency parser architectures for UD parsing, finding that when using modern neural network backbones, simpler is often better, with more sophisticated setups offering little in the way of performance improvements.
The third challenge relates to structure prediction for downstream NLP tasks. Here, we investigate the tasks of Negation Resolution and Relation Extraction by means of framing them as graph parsing problems and utilizing neural architectures similar to those studied for dependency parsing. We find that such an approach generally yields robust results, but is not clearly superior to “shallow” sequence labeling.
In sum, we hope that our contributions serve to inform and inspire future research on the role of explicit structure in NLP, and more generally within the emerging paradigm of artificial intelligence (AI) that combines neural networks with rule-based algorithms and symbolic representations (“neuro-symbolic AI”).