German clause-embedding predicates : an extraction and classification approach

Lapshinova-Koltunski, Ekaterina

German clause-embedding predicates : an extraction and classification approach

Files

publication.pdf (1.63 MB)

Date

2010

Authors

Lapshinova-Koltunski, Ekaterina

Abstract

This thesis describes a semi-automatic approach to the analysis of subcategorisation properties of verbal, nominal and multiword predicates in German. We semi-automatically classify predicates according to their subcategorisation properties by means of extracting them from German corpora along with their complements. In this work, we concentrate exclusively on sentential complements, such as dass, ob and w-clauses, although our methods can be also applied for other complement types. Our aim is not only to extract and classify predicates but also to compare subcategorisation properties of morphologically related predicates, such as verbs and their nominalisations. It is usually assumed that subcategorisation properties of nominalisations are taken over from their underlying verbs. However, our tests show that there exist different types of relations between them. Thus, we review subcategorisation properties of morphologically related words and analyse their correspondences and differences. For this purpose, we elaborate a set of semi-automatic procedures, which allow us not only to classify extracted units according to their subcategorisation properties, but also to compare the properties of verbs and their nominalisations, which occur both freely in corpora and within a multiword expression. The lexical data are created to serve symbolic NLP, especially large symbolic grammars for deep processing, such as HPSG or LFG, cf. work in the LinGO project (Copestake et al. 2004) and the Pargram project (Butt et al. 2002). HPSG and LFG need detailed linguistic knowledge. Besides that, subcategorisation iformation can be applied in applications for IE, cf. (Surdeanu et al. 2003). Moreover, this information is necessary for linguistic, lexicographic, SLA and translation work. Our extraction and classification procedures are precision-oriented, which means that we focus on high accuracy of our extraction and classification results. High precision is opposed to completeness, which is compensated by the application of extraction procedures on larger corpora.

Die vorliegende Arbeit beschreibt einen Ansatz zur semi-automatischen Analyse von deutschen Prädikaten. Verben, Nomina und Mehrwortausdrücke (MWAs) werden automatisch aus den Copora extrahiert und nach ihren Valenzeigenschaften klassifiziert. In dieser Arbeit berücksichtigen wir nur satzförmige Komplemente, obwohl diese Methode für die Extraktion weiterer Komplementtypen geeignet ist. Neben der subkategorisierungsbasierten Klassifikation wollen wir auch die Eigenschaften morphologisch verwandter Prädikate (e.g. Verben und ihrer Nominalisierungen) vergleichen. In den meisten Ansätzen wird generell angenommen, dass Nominalisierungen ihre Valenzeigenschaften von den Basisverben übernehmen oder erben. Dennoch zeigen unsere Extraktionsexperimente, dass diese Annahme nicht immer stimmt. Deswegen befaßt sich diese Arbeit mit dem Vergleich der Valenzeigenschaften von Verben und Nominalisierungen und der Analyse ihrer Übereinstimmungen und Unterschiede. Dafür entwerfen wir ein semi-atomatisches Verfahren zur Extraktion und Klassifikation der Valenzeigenschaften deutscher Prädikate, sowie der Relationen zwischen Valenzeigenschaften von Verben und ihren Nominalisierungen. Die extrahierten Daten können für symbolische NLP-Systeme angewendet werden, besonders für die symbolischen Grammatiktheorien LFG und HPSG1. Ausführliche lexikalische Informationen sind für diese Grammatiken sehr wichtig. Außerdem sind Informationen über Subkategorisierung für Linguistik, Lexikographie, sowie multilinguale Ansätze, z.B. Fremdsprachenunterricht oder Übersetzungen, notwendig. Unser Ziel ist höhere Präzision der Extraktions- und Klassifikationsergebnisse zu erreichen. Somit wird ihre Vollständigkeit vernachlässigt, was wir durch die Anzahl der verwendeten Corpora ausgleichen wollen.

URI

http://nbn-resolving.de/urn:nbn:de:bsz:93-opus-60245
http://elib.uni-stuttgart.de/handle/11682/2710
http://dx.doi.org/10.18419/opus-2693

Collections

05 Fakultät Informatik, Elektrotechnik und Informationstechnik

Full item page

German clause-embedding predicates : an extraction and classification approach

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By