05 Fakultät Informatik, Elektrotechnik und Informationstechnik

Permanent URI for this collectionhttps://elib.uni-stuttgart.de/handle/11682/6

Browse

Search Results

Now showing 1 - 10 of 17
  • Thumbnail Image
    ItemOpen Access
    Supervised semantic proximity noise and disagreement detection
    (2024) Choppa, Tejaswi
    The quality and reliability of annotated data are crucial for the development of Ma­chine Learning models. In this work, we particularly focus on word sense annotation in context (a.k.a. Word-in-Context, WiC). WiC datasets in real-world contexts of­ten exhibit significant disagreement. As a result, information is lost when instances are discarded during the creation of the gold label by adjudicating the annotations through majority or median judgment. Recent advancements have sought to ad­dress this issue by incorporating disagreement data through novel label aggregation methods (Uma et al., 2022). Modeling this disagreement is important because, in a real-world scenario, we often do not have clean data. We need to predict on samples where high disagreement is expected and which are inherently difficult to categorize. Predicting disagreement can help detect or filter highly complex samples. Through this thesis, we aim to build machine learning models that predict human disagreement in annotated text instances. Moreover, we focus on data with noise instances where annotators cannot confidently assign a label or the data does not fit predefined categories. We aim to measure both disagreement and noise, as they both stem from a common source: ambiguity. By modeling these aspects, we aim to design modeling approaches that predict not only the semantic proximity label but also the annotator disagreement, as well as data noisiness.
  • Thumbnail Image
    ItemOpen Access
    Exploring the effects of enriched English language input on language model efficiency
    (2024) Zeller, Tom
    Recent years have seen the advent of large-scale language modeling as exemplified by transformer-based models like GPT or variants of the BERT architecture. These models, which are trained on massive datasets and using compute unattainable by actors that are not of the scale of the biggest tech companies, have shown impressive feats of syntactic and semantic understanding. Naturally, interest has risen in making these models more efficient, in terms of compute as well as data requirements. Research in this area can be seen as primarily motivated by two factors: reducing the barrier for smaller actors like research institutes or end consumers to train and execute state-of-the-art models, as well as reducing the carbon footprint of these models. To achieve this goal, model compression techniques like quantization, pruning or distillation are utilized. This work aims to explore a different, less model-centric and more data-centric approach: Modifying the training and inference data, by enriching it with syntactic and semantic information. To this end, a lexical resource is created which maps English words to a form where individual characters represent values of a range of semantic and syntactic features, providing lexical information that is accessible to all model types that operate on tokens at the sub-word or character-level. Different features and methods of representation are discussed, and their effect on model performance is evaluated by pretraining a small GPT-family model and fine-tuning on downstream tasks of the SuperGLUE benchmark. Given a fixed amount of data and compute, the experiments show a performance advantage for a character-level model trained using the enriched data.
  • Thumbnail Image
    ItemOpen Access
    An attribution method for classification tasks in Siamese models
    (2024) Liu, Mindong
    Explaining the contribution of tokens on classification results in the classification task of two sentences is a challenging problem in natural language processing (NLP). This thesis studies the use of the Integrated Jacobians (IJ) in interpreting multi-class classification models with Siamese models, particularly its application in Natural Language Inference (NLI). The NLI task requires models to understand the logical relationships between two sentences, posing challenges for model interpretability. To address the fact that the original Siamese model was primarily designed for regression tasks, the thesis first expanded Siamese models for classification tasks with bilinear similarity while ensuring that the IJ methods can be utilized. It then adapts two forms of the IJ methods: exact IJ and approximate IJ, to work with newly extended Siamese models. To validate the effectiveness of the extended Siamese models using the IJ meth ods, the thesis conducted experiments on the AllNLI dataset under sentence-BERT framework. The thesis employed four different model configurations and applied both IJ methods to these models. The experimental results demonstrate that the IJ methods effectively provide explanations for us. Finally, the thesis examined the consistency between the explanations provided by the IJ methods and semantic relationships at the lexical and span levels using datasets WordNet and SpanEX. In the analysis, the IJ methods show that the models capture semantic relationships between words and spans, and there is a correlation between these relationships and the model’s predictions. This finding supports the use of the IJ methods to explain the decisions of NLP models.
  • Thumbnail Image
    ItemOpen Access
    Cross-lingual word embeddings with multi-sense representations
    (2024) Shim, Soh-Eun
    Cross-lingual word embeddings have been found to be useful in aiding cross-lingual transfer, but work in this line of research has to date rarely addressed the monosemy constraint of static word embeddings in depth, where the collapse of multiple meanings into one form might arguably lead to subpar alignments. In this thesis, we address this gap by examining potential approaches towards the incorporation of sense information into cross-lingual alignment. We explore in specfic two variants of cross-lingual multi-sense alignment: one in which we employ the method of embedding the senses of each word as a Gaussian mixture (Athiwaratkun and Wilson, 2017), where the assumption is that multi-sense embeddings as a basis for alignment may help mitigate the meaning conflation deficiency (Camacho-Collados and Pilehvar, 2018), and in turn help improve isomorphism between vector spaces (Ruder et al., 2019). Our second method explores learning a cross-lingual multi-sense embedding space by reversing the order: we cross-lingually align uni-sense word embeddings, and attempt multi-sense enrichment as a postprocessing step by retrofitting (Pilehvar and Collier, 2016) the embedding on the Open Multilingual Wordnet (Bond et al., 2023). We observe that our model is capable of fine-grained cross-lingual semantic distinctions, where our model successfully identifies colexifications without cross-lingual supervision.
  • Thumbnail Image
    ItemOpen Access
    Exploring retrieval-augmented language modeling for material prediction of vehicle components
    (2024) Wagner, Frederik
    Jüngste Fortschritte im Bereich natural language processing (NLP), insbesondere bei großen Sprachmodellen (large language models, LLMs) wie ChatGPT, zeigen das Potenzial für ihre Anwendung bei einer Vielzahl von Aufgaben in speziellen Domänen. In der Automobilbranche könnten sie beispielsweise zur Unterstützung bei der Reparatur eines Fahrzeugs eingesetzt werden. Diese Arbeit befasst sich mit dem Problem der Vorhersage geeigneter Materialien für Fahrzeugkomponenten, wie z. B. Bremsscheiben. Es soll ermittelt werden, ob LLMs sowohl auf Allgemein- als auch auf domänenspezifisches Wissen zurückgreifen können, um genaue Vorhersagen über Komponentenmaterialien zu treffen, ohne dass eine umfangreiche Feinabstimmung (fine-tuning) erforderlich ist. Erreicht wird dies durch retrieval-augmented generation (RAG), wobei relevante Informationen aus externen Quellen abgerufen und zur Verbesserung der Modelleingabe des LLMs verwendet werden. In dieser Arbeit werden drei Ansätze verglichen: ein Standard-LLM-Modell, ein einfacher RAG-Ansatz und eine iterative RAG-Methode namens Chain-of-Verification (CoVe). In dieser Arbeit wird auch ein eigenes Annotationstool entwickelt, um eine menschliche Evaluierungsstudie zu erleichtern, da es keinen Goldstandard-Datensatz gibt. Die Ergebnisse zeigen, dass LLMs bei der Materialvorhersage gut abschneiden, und obwohl beide RAG-Ansätze die Vorhersagequalität nicht signifikant verbessern, verschlechtern sie sie auch nicht. Diese Forschungsarbeit kommt zu dem Schluss, dass LLMs mit oder ohne Retrieval-Ergänzung eine vielversprechende Lösung für die Materialvorhersage bei Fahrzeugkomponenten bieten, auch wenn es noch Herausforderungen bei der Bewertung, der Hyperparameter-Optimierung und dem Daten-Retrieval gibt.
  • Thumbnail Image
    ItemOpen Access
    Spanish dialect classification : a comparative study of linguistically tailored features, unigrams and BERT embeddings
    (2024) Zeidler, Laura
    Dealing with linguistic varieties has become a topic of interest for the NLP community in recent years. This also includes the task of automatic dialect classification which is most often tackled by training traditional machine learning models on bag-of-words unigram features. This thesis explores two alternative approaches for the task of distinguishing dialects from 20 Spanish-speaking countries using the Corpus del Español. Firstly, dialectal features that are tailored to the dialects in the data were extracted and transformed into a feature set. Two traditional machine learning models, namely a support vector machine and a decision tree model, were then trained on the tailored features, standard unigram features and a combination of the two respectively. This was done to assess the benefit of incorporating manually selected features with unigrams. Secondly, a pre-trained BERT transformer model was fine-tuned on the dialect classification task and compared to the previously mentioned models. The experiments revealed that the current tailored feature set generally does not contribute positively to the performance of the traditional machine learning models. It does, however, show potential for capturing dialectal differences. Regarding the transformer model, while it could not be outperformed by the traditional models, the difference in accuracy between the BERT model and the best-performing traditional model was relatively small and performance was even on par for one additional study. Additionally, both unigram-based and transformer models were found to rely heavily on content-related lexical items, such as named entities.
  • Thumbnail Image
    ItemOpen Access
    RAGAR, your falsehood RADAR : RAG-augmented reasoning for political fact-checking using multimodal large language models
    (2024) Abdul Khaliq, Mohammed
    The escalating challenge of misinformation, particularly in the context of political discourse, necessitates advanced solutions for fact-checking. This thesis introduces innovative approaches to enhance the reliability and efficiency of multimodal fact-checking through the integration of large language models (LLMs) with Retrieval-augmented Generation (RAG) based advanced reasoning techniques. In the digital era, where misinformation spreads rapidly across various media, including text and images, there's a critical need for robust mechanisms capable of evaluating the veracity of political claims. This work proposes two novel methodologies, Chain of RAG (CoRAG) and Tree of RAG (ToRAG), and their hybrid implementations incorporating Chain of Thought and Chain of Verification. These approaches leverage RAG techniques utilizing multimodal LLMs with reasoning techniques. The approaches are designed to process and assess political claims by considering textual and visual information, providing a comprehensive approach to fact-checking. This thesis explores the implementation of these approaches within a multimodal fact-checking pipeline, highlighting their effectiveness in improving the accuracy of veracity predictions and the generation of explanations. By employing multimodal LLMs adept at analyzing text and images, this research advances the capability of automated systems in identifying and countering misinformation. The experimental evaluation demonstrates that the proposed RAG-augmented Reasoning (RAGAR) techniques outperform existing methods that rely on sub-question generation, offering a promising solution to the challenges of political fact-checking. This thesis contributes to the fields of computational linguistics and political science by providing an effective approach to combat fake news, thereby enhancing the integrity of political discourse in the digital age.
  • Thumbnail Image
    ItemOpen Access
    Controllable text-to-speech system : speaking style control using hierarchical variational autoencoder
    (2024) Yang, Yung-Ching
    This research proposes an utterance embedding model that provides disentangling and scalable control over latent attributes in human speech. Our model is formulated as a hierarchical generative model based on the Variational Autoencoder (VAE) framework, integrated with the FastSpeech2 Text-to-Speech (TTS) system. The work demonstrates that image initiative networks on hierarchical pattern learning can be adapted to model complex distributions in speaking styles and prosody. This work merges advancements in VAE research-particularly those addressing critical statistical challenges such as posterior collapse and unbounded KL divergence-with recent studies focusing on structural enhancements of architectures in VAEs. We introduce a hierarchical structure in latent variable modeling and augment the learning objective with hierarchical information to ensure the latent variables at each level are hierarchically factorized. This approach learns the smooth latent prosody space and deepens our understanding of the relationship between the hierarchical nature of prosody and neural network architecture. Through our customized control mechanism, integrated into various levels of the latent spaces, the model is capable of manipulation of prosodic elements, allowing for both independent and scalable adjustments. By incorporating these techniques, our model is capable of capturing a wide range of prosodic variations, offering a refined level of control and expressiveness in speech synthesis in unsupervised learning contexts.
  • Thumbnail Image
    ItemOpen Access
    Semantic agreement and the agreement hierarchy in large language models of Russian
    (2024) Kuryanov, Ilya
    This thesis investigates the phenomenon of mixed agreement in Russian, where certain nouns denoting professions can trigger both syntactic and semantic agreement. We construct challenge sets testing different aspects of this phenomenon for pre-trained masked language models of Russian, and find that all models considered are able to model the syntactic restrictions on mixed agreement, and, to varying degrees, the preferences for semantic agreement that are observed in natural language use. We also find evidence that the models' behavior on these challenge sets is influenced by gender bias associated with the nouns in question, and that the two kinds of agreement are represented differently in the internal structure of the model.
  • Thumbnail Image
    ItemOpen Access
    An analysis of the domain-specific applicability of text-to-SQL systems on a linguistic database
    (2024) Ateri, Maria Vittoria
    The applicability and adaptability of text-to-SQL systems trained on reference databases to more complicated ones is an open question. This thesis attempts to provide intuitions on the challenges and limitations when applying benchmark systems to more complicated databases. For this, two exemplary systems, namely the IRNet and SmBop, both trained on the Spider dataset, are applied to the complex linguistic relational database DIRNDL. The primary aim is to analyze to what extent the systems manage to produce accurate queries and retrieve correct information when the inference is conducted on a database of greater complexity and dimensions compared to the databases contained in the Spider dataset (the main benchmark in the field). Intentionally, no re-training is performed. A comparison between the two systems is also conducted. In addition to this, the sensitivity to lexical changes and question complexity variation is part of the analysis carried out in this work. Through a qualitative evaluation, the current work provides insights into which model architecture works better for complex linguistic databases, and the limits of both systems. The main findings are that the SmBop system is superior to the IRNet one, and that SmBop is also more sensitive to lexical changes in the database schema. Nevertheless, neither system shows a satisfactory performance when the goal is the synthesis of more complex queries which are used in real-world research settings.