Please use this identifier to cite or link to this item:
Authors: Richardson, Kyle
Title: New resources and ideas for semantic parser induction
Issue Date: 2018 Dissertation 207
Abstract: In this thesis, we investigate the general topic of computational natural language understanding (NLU), which has as its goal the development of algorithms and other computational methods that support reasoning about natural language by the computer. Under the classical approach, NLU models work similar to computer compilers (Aho et al., 1986), and include as a central component a semantic parser that translates natural language input (i.e., the compiler’s high-level language) to lower-level formal languages that facilitate program execution and exact reasoning. Given the difficulty of building natural language compilers by hand, recent work has centered around semantic parser induction, or on using machine learning to learn semantic parsers and semantic representations from parallel data consisting of example text-meaning pairs (Mooney, 2007a). One inherent difficulty in this data-driven approach is finding the parallel data needed to train the target semantic parsing models, given that such data does not occur naturally “in the wild” (Halevy et al., 2009). Even when data is available, the amount of domain- and language-specific data and the nature of the available annotations might be insufficient for robust machine learning and capturing the full range of NLU phenomena. Given these underlying resource issues, the semantic parsing field is in constant need of new resources and datasets, as well as novel learning techniques and task evaluations that make models more robust and adaptable to the many applications that require reliable semantic parsing. To address the main resource problem involving finding parallel data, we investigate the idea of using source code libraries, or collections of code and text documentation, as a parallel corpus for semantic parser development and introduce 45 new datasets in this domain and a new and challenging text-to-code translation task. As a way of addressing the lack of domain- and language-specific parallel data, we then use these and other benchmark datasets to investigate training se- mantic parsers on multiple datasets, which helps semantic parsers to generalize across different domains and languages and solve new tasks such as polyglot decoding and zero-shot translation (i.e., translating over and between multiple natural and formal languages and unobserved language pairs). Finally, to address the issue of insufficient annotations, we introduce a new learning framework called learning from entailment that uses entailment information (i.e., high-level inferences about whether the meaning of one sentence follows from another) as a weak learning signal to train semantic parsers to reason about the holes in their analysis and learn improved semantic representations. Taken together, this thesis contributes a wide range of new techniques and technical solutions to help build semantic parsing models with minimal amounts of training supervision and manual engineering effort, hence avoiding the resource issues described at the onset. We also introduce a diverse set of new NLU tasks for evaluating semantic parsing models, which we believe help to extend the scope and real world applicability of semantic parsing and computational NLU.
Appears in Collections:05 Fakultät Informatik, Elektrotechnik und Informationstechnik

Files in This Item:
File Description SizeFormat 
richardson_thesis_oct2018.pdf3,6 MBAdobe PDFView/Open

Items in OPUS are protected by copyright, with all rights reserved, unless otherwise indicated.