Universität Stuttgart

Permanent URI for this communityhttps://elib.uni-stuttgart.de/handle/11682/1

Browse

Search Results

Now showing 1 - 2 of 2
  • Thumbnail Image
    ItemOpen Access
    Constructing syntax-based distributional semantic models for novel languages
    (2019) Utt, Jason; Padó, Sebastian (Prof. Dr.)
    Rechner-gestützte Modelle von Wortbedeutung bedürfen typischerweise umfangreiche Textdaten in der gewünschten Zielsprache. Heutzutage sorgt die ständig wachsende Anzahl von frei verfügbaren Webseiten dafür, dass die Erstellung solcher distributionellen semantischen Modellen (DSMs), welche robust und von hoher lexikalischen Abdeckung sind, in immer mehr Sprachen möglich wird. Zu den vielseitigsten DSMs gehören die strukturierten DSMs (SDSMs), welche den Kontextbegriff über einfache Nachbarworten auf syntaktische und andere Relationen ausdehnen. Dadurch erlauben sie Ähnlichkeitsvorhersagen, die über die thematischen Bedeutungsaspekte eines Wortes, oder gar einer syntaktischen Verknüpfung von Wörtern, hinaus auch die relationaler Natur einbeziehen. Textdaten alleine reichen jedoch nicht aus, um SDSMs zu konstruieren. Es werden zuverlässige und effiziente Parser in der Zielsprache benötigt, um die syntaktischen Analysen zu erhalten; was zur Folge hat, dass momentan leider nur wenige Sprachen von solchen Modellen profitieren können. Diese Dissertation untersucht Verfahren, die es erlauben, für neue Sprachen strukturierte distributionelle semantische Modelle zu erzeugen und testet diese auf einer Reihe von semantischen Aufgaben. Es wird zunächst ein monolinguales SDSM von einem zielsprachigen Textcorpus mittler Größe erzeugt; werden Methoden ermittelt, mit denen man ausschließlich mithilfe eines einfachen bilingualen Lexikons ein cross-linguales SDSM. Weiter wird aufgezeigt, wie diese zwei SDSM-Typen verknüpft werden können, um ein multilinguales Modell zu erhalten, welches die Vorteile beider Eingabemodelle behält und somit hohe Abdeckungsraten mit genauen Vorhersagen aufweist.
  • Thumbnail Image
    ItemOpen Access
    Human and computational measurement of lexical semantic change
    (2023) Schlechtweg, Dominik; Schulte im Walde, Sabine (apl. Prof. Dr.)
    Human language changes over time. This change occurs on several linguistic levels such as grammar, sound or meaning. The study of meaning changes on the word level is often called 'Lexical Semantic Change' (LSC) and is traditionally either approached from an onomasiological perspective asking by which words a meaning can be expressed, or a semasiological perspective asking which meanings a word can express over time. In recent years, the task of automatic detection of semasiological LSC from textual data has been established as a proper field of computational linguistics under the name of 'Lexical Semantic Change Detection' (LSCD). Two main factors have contributed to this development: (i) The 'digital turn' in the humanities has made large amounts of historical texts available in digital form. (ii) New computational models have been introduced efficiently learning semantic aspects of words solely from text. One of the main motivations behind the work on LSCD are their applications in historical semantics and historical lexicography, where researchers are concerned with the classification of words into categories of semantic change. Automatic methods have the advantage to produce semantic change predictions for large amounts of data in small amounts of time and could thus considerably decrease human efforts in the mentioned fields while being able to scan more data and thus to uncover more semantic changes, which are at the same time less biased towards ad hoc sampling criteria used by researchers. On the other hand, automatic methods may also be hurtful when their predictions are biased, i.e., they may miss numerous semantic changes or label words as changing which are not. Results produced in this way may then lead researchers to make empirically inadequate generalizations on semantic change. Hence, automatic change detection methods should not be trusted until they have been evaluated thoroughly and their predictions have been shown to reach an acceptable level of correctness. Despite the rapid growth of LSCD as a field, a solid evaluation of the wealth of proposed models was still missing at the onset of this thesis. The reasons were multiple, but most importantly there was no annotated benchmark test set available. This thesis is thus concerned with the process of providing such an evaluation for LSCD, including • the definition of the basic concepts and tasks, • the development and validation of data annotation schemes with humans, • the annotation of a multilingual benchmark test set, • the evaluation of computational models on the benchmark, their analysis and improvement, as well as • an application of the developed methods to showcase their usefulness in the targeted fields (historical semantics and lexicography).