05 Fakultät Informatik, Elektrotechnik und Informationstechnik

Permanent URI for this collectionhttps://elib.uni-stuttgart.de/handle/11682/6

Browse

Search Results

Now showing 1 - 10 of 39
  • Thumbnail Image
    ItemOpen Access
    KGGLDM : Knowledge Graph Guided Diffusion Models for advanced learning
    (2024) Gupta, Akshat
    This thesis explores a novel approach by bridging the gap of diffusion modeling and knowledge graphs, unveiling a potentially groundbreaking direction that serves as the central theme of this work. We propose incorporating knowledge graph guidance into LDM models to augment precise control over sample generation using domain conceptual knowledge.
  • Thumbnail Image
    ItemOpen Access
    Exploring the effects of enriched English language input on language model efficiency
    (2024) Zeller, Tom
    Recent years have seen the advent of large-scale language modeling as exemplified by transformer-based models like GPT or variants of the BERT architecture. These models, which are trained on massive datasets and using compute unattainable by actors that are not of the scale of the biggest tech companies, have shown impressive feats of syntactic and semantic understanding. Naturally, interest has risen in making these models more efficient, in terms of compute as well as data requirements. Research in this area can be seen as primarily motivated by two factors: reducing the barrier for smaller actors like research institutes or end consumers to train and execute state-of-the-art models, as well as reducing the carbon footprint of these models. To achieve this goal, model compression techniques like quantization, pruning or distillation are utilized. This work aims to explore a different, less model-centric and more data-centric approach: Modifying the training and inference data, by enriching it with syntactic and semantic information. To this end, a lexical resource is created which maps English words to a form where individual characters represent values of a range of semantic and syntactic features, providing lexical information that is accessible to all model types that operate on tokens at the sub-word or character-level. Different features and methods of representation are discussed, and their effect on model performance is evaluated by pretraining a small GPT-family model and fine-tuning on downstream tasks of the SuperGLUE benchmark. Given a fixed amount of data and compute, the experiments show a performance advantage for a character-level model trained using the enriched data.
  • Thumbnail Image
    ItemOpen Access
    Cycle-consistent adversarial networks for automatic speech recognition
    (2024) Li, Chia-Yu; Vu, Ngoc Thang (Prof. Dr.)
  • Thumbnail Image
    ItemOpen Access
    Linguistically-informed modeling of potentials for misunderstanding
    (2024) Anthonio, Talita; Roth, Michael (Dr.)
    Misunderstandings are prevalent in communication. While there is a large amount of work on misunderstandings in conversations, only little attention has been given to misunderstandings that arise from text. This is because readers and writers typically do not interact with one another. However, texts that potentially evoke different interpretations can be identified by certain linguistic phenomena, especially those related to implicitness or underspecificity. In Computational Linguistics, there is a considerable amount of work conducted on such linguistic phenomena and the computational modeling thereof. However, most of these studies do not examine when these phenomena cause misunderstandings. This is a crucial aspect, because ambiguous language does not always cause misunderstanding. In this thesis, we provide the first steps to develop a computational model that can automatically identify whether an instructional text is likely to cause misunderstandings ("potentials for misunderstanding"). To achieve this goal, we build large corpora with potentials for misunderstanding in instructional texts. We follow previous work and define misunderstandings as the existence of multiple, plausible interpretations. As these interpretations may be similar in meaning to one another, we specifically define misunderstandings as the existence of multiple plausible, but conflicting interpretations. Therefore, we find texts that potentially cause misunderstanding ("potentials for misunderstanding") by looking for passages that have several plausible interpretations that are conflicting to one another. We automatically identify such passages from revision histories of instructional texts, based on the finding that we can find potentials for misunderstanding by looking into older versions of a text, and their clarifications thereof in newer versions. We specifically look for unclarified sentences that contain implicit and underspecified language, and study their clarifications. Through several analyses and crowdsourcing studies, we demonstrate that our corpora provide valuable resources on potentials for misunderstanding, as we find that revised sentences are better than their previous ones. Furthermore, we show that the provided corpora can be used for several computational modeling purposes. The three resulting models can each be combined to identify whether a text potentially causes misunderstanding or not. More specifically, we first develop a model that can detect improvements in a text, even when they are subtle and closely dependent on the context. In an analysis, we verify that the judgements from the model on what makes a better or equally good sentence overlap with the judgements by humans. Secondly, we build a transformer-based language model that automatically resolves potentials for misunderstanding caused by implicit references. We find that modeling discourse context improves the performance of this model. In an analysis, we find that the best model is not only capable of generating the golden resolution, but also capable of generating several plausible resolutions for implicit references in instructional text. We use this finding to build a large dataset with plausible and implausible resolutions of implicit and underspecified elements. We use the resulting dataset for a third computational task, in which we train a model to automatically distinguish between plausible and implausible resolutions for implicit and underspecified elements. We show that this model and the provided dataset can be used to find passages with several, plausible clarifications. Since our definition of misunderstanding focuses on conflicting clarifications, we conduct a final study to conclude the thesis. In particular, we provide and validate a crowdsourcing set-up that allows to find the cases with conflicting, plausible, resolutions. The set-up and findings could be used in future research to directly train a model to identify passages with implicit elements that have conflicting resolutions.
  • Thumbnail Image
    ItemOpen Access
    RAGAR, your falsehood RADAR : RAG-augmented reasoning for political fact-checking using multimodal large language models
    (2024) Abdul Khaliq, Mohammed
    The escalating challenge of misinformation, particularly in the context of political discourse, necessitates advanced solutions for fact-checking. This thesis introduces innovative approaches to enhance the reliability and efficiency of multimodal fact-checking through the integration of large language models (LLMs) with Retrieval-augmented Generation (RAG) based advanced reasoning techniques. In the digital era, where misinformation spreads rapidly across various media, including text and images, there's a critical need for robust mechanisms capable of evaluating the veracity of political claims. This work proposes two novel methodologies, Chain of RAG (CoRAG) and Tree of RAG (ToRAG), and their hybrid implementations incorporating Chain of Thought and Chain of Verification. These approaches leverage RAG techniques utilizing multimodal LLMs with reasoning techniques. The approaches are designed to process and assess political claims by considering textual and visual information, providing a comprehensive approach to fact-checking. This thesis explores the implementation of these approaches within a multimodal fact-checking pipeline, highlighting their effectiveness in improving the accuracy of veracity predictions and the generation of explanations. By employing multimodal LLMs adept at analyzing text and images, this research advances the capability of automated systems in identifying and countering misinformation. The experimental evaluation demonstrates that the proposed RAG-augmented Reasoning (RAGAR) techniques outperform existing methods that rely on sub-question generation, offering a promising solution to the challenges of political fact-checking. This thesis contributes to the fields of computational linguistics and political science by providing an effective approach to combat fake news, thereby enhancing the integrity of political discourse in the digital age.
  • Thumbnail Image
    ItemOpen Access
    CAPTCHA mechanisms using semantic NLU tasks
    (2024) Wolkober, Marcel
    In 2019, one-fourth of all internet traffic was made of malicious bots. CAPTCHAs are a main countermeasure used as a test to detect non-human users. With the advancement of computational attackers using artificial intelligence, their success in these CAPTCHAs rises continuously. This bachelor’s thesis aims to provide new CAPTCHA mechanisms using semantic natural language understanding (NLU) tasks, which are generally considered hard to solve for advanced computational attackers. The task used for the NLU CAPTCHA challenges involves rating the semantic similarity of a word in two different contexts. A study with 275 participants was conducted to evaluate human usability. The study results show a high human difficulty for these challenges. Further, these challenges provide insufficient resistance against an advanced attacker. This leads to the conclusion that, in the current state, the semantic NLU CAPTCHA provides no benefit to existing CAPTCHAs. However, some challenge results indicate that with further adjustments, NLU tasks may still be relevant in use for CAPTCHA challenges.
  • Thumbnail Image
    ItemOpen Access
    Semantic agreement and the agreement hierarchy in large language models of Russian
    (2024) Kuryanov, Ilya
    This thesis investigates the phenomenon of mixed agreement in Russian, where certain nouns denoting professions can trigger both syntactic and semantic agreement. We construct challenge sets testing different aspects of this phenomenon for pre-trained masked language models of Russian, and find that all models considered are able to model the syntactic restrictions on mixed agreement, and, to varying degrees, the preferences for semantic agreement that are observed in natural language use. We also find evidence that the models' behavior on these challenge sets is influenced by gender bias associated with the nouns in question, and that the two kinds of agreement are represented differently in the internal structure of the model.
  • Thumbnail Image
    ItemOpen Access
    Exploring retrieval-augmented language modeling for material prediction of vehicle components
    (2024) Wagner, Frederik
    Jüngste Fortschritte im Bereich natural language processing (NLP), insbesondere bei großen Sprachmodellen (large language models, LLMs) wie ChatGPT, zeigen das Potenzial für ihre Anwendung bei einer Vielzahl von Aufgaben in speziellen Domänen. In der Automobilbranche könnten sie beispielsweise zur Unterstützung bei der Reparatur eines Fahrzeugs eingesetzt werden. Diese Arbeit befasst sich mit dem Problem der Vorhersage geeigneter Materialien für Fahrzeugkomponenten, wie z. B. Bremsscheiben. Es soll ermittelt werden, ob LLMs sowohl auf Allgemein- als auch auf domänenspezifisches Wissen zurückgreifen können, um genaue Vorhersagen über Komponentenmaterialien zu treffen, ohne dass eine umfangreiche Feinabstimmung (fine-tuning) erforderlich ist. Erreicht wird dies durch retrieval-augmented generation (RAG), wobei relevante Informationen aus externen Quellen abgerufen und zur Verbesserung der Modelleingabe des LLMs verwendet werden. In dieser Arbeit werden drei Ansätze verglichen: ein Standard-LLM-Modell, ein einfacher RAG-Ansatz und eine iterative RAG-Methode namens Chain-of-Verification (CoVe). In dieser Arbeit wird auch ein eigenes Annotationstool entwickelt, um eine menschliche Evaluierungsstudie zu erleichtern, da es keinen Goldstandard-Datensatz gibt. Die Ergebnisse zeigen, dass LLMs bei der Materialvorhersage gut abschneiden, und obwohl beide RAG-Ansätze die Vorhersagequalität nicht signifikant verbessern, verschlechtern sie sie auch nicht. Diese Forschungsarbeit kommt zu dem Schluss, dass LLMs mit oder ohne Retrieval-Ergänzung eine vielversprechende Lösung für die Materialvorhersage bei Fahrzeugkomponenten bieten, auch wenn es noch Herausforderungen bei der Bewertung, der Hyperparameter-Optimierung und dem Daten-Retrieval gibt.
  • Thumbnail Image
    ItemOpen Access
    Controllable text-to-speech system : speaking style control using hierarchical variational autoencoder
    (2024) Yang, Yung-Ching
    This research proposes an utterance embedding model that provides disentangling and scalable control over latent attributes in human speech. Our model is formulated as a hierarchical generative model based on the Variational Autoencoder (VAE) framework, integrated with the FastSpeech2 Text-to-Speech (TTS) system. The work demonstrates that image initiative networks on hierarchical pattern learning can be adapted to model complex distributions in speaking styles and prosody. This work merges advancements in VAE research-particularly those addressing critical statistical challenges such as posterior collapse and unbounded KL divergence-with recent studies focusing on structural enhancements of architectures in VAEs. We introduce a hierarchical structure in latent variable modeling and augment the learning objective with hierarchical information to ensure the latent variables at each level are hierarchically factorized. This approach learns the smooth latent prosody space and deepens our understanding of the relationship between the hierarchical nature of prosody and neural network architecture. Through our customized control mechanism, integrated into various levels of the latent spaces, the model is capable of manipulation of prosodic elements, allowing for both independent and scalable adjustments. By incorporating these techniques, our model is capable of capturing a wide range of prosodic variations, offering a refined level of control and expressiveness in speech synthesis in unsupervised learning contexts.
  • Thumbnail Image
    ItemOpen Access
    Enhancing character type detection using coreference information : experiments on dramatic texts
    (2024) Pagel, Janis; Kuhn, Jonas (Prof. Dr.)
    This thesis describes experiments on enhancing machine-learning based detection of literary character types in German-language dramatic texts by using coreference information. The thesis makes four major contributions to the research discourse of character type detection and coreference resolution for German dramatic texts: (i) a corpus of annotations of coreference on dramatic texts, called GerDraCor-Coref, (ii) a rule-based system to automatically resolve coreferences on dramatic texts, called DramaCoref, as well as experiments and analyses of results by using DramaCoref on GerDraCor-Coref, (iii) experiments on the automatic detection of three selected character types (title characters, protagonists and schemers) using machine-learning approaches, and (iv) experiments on utilizing the coreference information of (i) and (ii) for improving the performance of character type detection of (iii).