Browsing by Author "Vu, Ngoc Thang (Prof. Dr.)"

Now showing 1 - 6 of 6

Open Access
Cycle-consistent adversarial networks for automatic speech recognition
(2024) Li, Chia-Yu; Vu, Ngoc Thang (Prof. Dr.)
Open Access
Human-centered explainable artificial intelligence for natural language processing
(2024) Schuff, Hendrik; Vu, Ngoc Thang (Prof. Dr.)
With the ongoing advances in artificial intelligence (AI) systems, their influence on our private, professional, and public life is expanding. While these systems' prediction performance increases, they often rely on opaque system architectures that hide the reasons for the systems' decisions. The field of explainable AI thus seeks to answer why a system returns its prediction. In this thesis, we explore explanatory methods for natural language processing (NLP) systems. Instead of focusing on the technical aspects of explainability in isolation, we take a human-centered approach and additionally explore users' perception of and their interaction with explainable NLP systems. Our contributions thus range on a spectrum from technology-centered machine learning contributions to human-centered studies of cognitive biases. On the technical end of the spectrum, we first contribute novel approaches to integrate external knowledge into explainable natural language inference (NLI) systems and study the effect of different sources of external knowledge on fine-grained model reasoning capabilities. We compare automatic evaluation with user-perceived system quality and find an equally surprising and alarming disconnect between the two. Second, we present a novel self-correction paradigm inspired by Hegel's dialectics. We apply our resulting thought flow network method to question answering (QA) systems and demonstrate our method's ability to self-correct model predictions that increase prediction performance and additionally find that the corresponding decision sequence explanations enable significant improvements in the users' interaction with the system and enhance user-perceived system quality. Our architectural and algorithmic contributions are followed by an in-depth investigation of explanation quality quantification. We first focus on explainable QA systems and find that the currently used proxy scores fail to capture to which extent an explanation is relevant to the system's answer. We thus propose the two novel model-agnostic scores FaRM and LocA, which quantify a system's internal explanation-answer coupling following two complementary approaches. Second, we consider general explanation quality and discuss its characteristics and how they are violated by current evaluation practices at the example of a popular explainable QA leaderboard. We provide guidelines for explanation quality evaluation and propose our novel "Pareto Front leaderboard" method to construct system rankings to overcome challenges in explanation quality evaluation. In the last part of the thesis, we focus on human perception of explanations. We first investigate how users interpret the frequently used heatmap explanations over text. We find that the information communicated by the explanations differs from the information understood by the users. In a series of studies, we discover distorting effects of various types of biases and demonstrate that cognitive biases, learning effects, and linguistic properties can distort users' interpretation of explanations. We question the use of heatmap visualizations and propose alternative visualization methods. Second, we develop, validate, and apply a novel questionnaire to measure perceived system predictability. Concretely, we contribute the novel perceived system predictability (PSP) scale, demonstrate its desirable psychometric properties, and use it to uncover a dissociation of perceived and objective predictability in the context of explainable NLP systems. Overall, this thesis highlights that progress in explainable NLP cannot rely on technical advances in isolation, but needs to simultaneously involve the recipients of explanations including their requirements, perception, and cognition.
Open Access
Low resource NLP for polysynthetic languages : morphological segmentation and machine translation
(2024) Mager Hois, Jesus Manuel; Vu, Ngoc Thang (Prof. Dr.)
This thesis explores the application of Natural Language Processing (NLP) techniques to morphologically rich indigenous languages of the Americas, focusing on low-resource scenarios. The work addresses the challenges of modeling morphological segmentation and machine translation for these languages, often lacking large annotated datasets and facing issues like code-switching and orthographic normalization. Contributions include the development of new datasets, the adaptation of neural network models for specific tasks, and the investigation of the impact of morphological segmentation on machine translation performance. Additionally, the thesis delves into the ethical implications of applying NLP technologies to these languages, considering the perspectives of native speakers and community leaders.
Open Access
Multiclass speech emotion recognition with neural networks : investigations on aspects of input data, multilingual modeling, and data scarcity
(2021) Neumann, Michael; Vu, Ngoc Thang (Prof. Dr.)
Open Access
Neural-based NLP systems for code-switched Arabic-English speech
(2024) Hamed, Injy; Vu, Ngoc Thang (Prof. Dr.)
In the ever-evolving language landscape, code-switching has emerged as an interesting linguistic phenomenon, where people seamlessly alternate between multiple languages in the same discourse. The global prevalence of this phenomenon placed a need for language technologies that are able to handle it proficiently, with the aim of providing user-friendly solutions. Despite this necessity, the progress in language technologies -and NLP research in general- in code-switching contexts is still lagging behind, compared to the remarkable strides achieved in monolingual languages. This disparity is also evident in the case of diglossic languages, such as Arabic, where language technologies are better supported in the formal variant of the language compared to the dialects. This serves as the motivation for our work, where we focus on the under-researched code-switched Egyptian Arabic-English language pair. This language pair offers an interesting set of challenges, where the complexity posed by code-switching is further compounded with challenges introduced by the primary language, including low-resourcefulness, morphological richness, and unstandardized orthography. Under this language setup, we tackle challenges in three dimensions: data collection, modeling, and evaluation. With regards to data collection, we collect a code-switched Egyptian Arabic-English speech translation corpus. The corpus consists of 12 hours of spontaneous speech gathered from bilingual speakers, containing considerable amounts of code-switching. As part of our work, we develop transcription and translation guidelines. Our ArzEn-ST corpus can be used in speech recognition, machine translation, speech translation, and linguistic analyses. We make the corpus publicly available to enable and facilitate further research for this language pair. With regards to modeling, we explore challenges and solutions in building machine translation and automatic speech recognition systems. Firstly, we compare two widely-used architectures in building speech recognition systems, namely hybrid and end-to-end architectures. We present a thorough comparison between both systems with regards to their multilingual and crosslingual knowledge transfer abilities, and their tolerance towards unstandardized orthography. We show that both systems provide comparable yet complementary performance, thus successfully propose hypotheses' combination for improving recognition. Secondly, we tackle the issue of data sparsity through segmentation, where we investigate the best segmentation approach for code-switched machine translation under different levels of low-resource settings. Thirdly, we present a comprehensive study for data augmentation, examining the relation between the quality of synthesized code-switched data and the improvements achieved in downstream tasks. Our experiments involve a wide-range of techniques, covering lexical replacements, linguistic theories, and back-translation. As part of our contribution, we examine the effectiveness in utilizing a code-switched predictive model that is capable of identifying plausible code-switching segments and augments the data accordingly. We also propose several steps for boosting the amount of generated code-switched data in back-translation, which is usually restricted by the limited amount of code-switched data. With regards to evaluation, we focus on the question of robust and fair evaluation metrics for speech recognition when dealing with code-switched and orthographically unstandardized languages, where both challenges are present in the language pair of our concern. We conduct an extensive study comparing the performance of a wide range of metrics against human judgment. Through our metrics, we overcome cross-transcription and unstandardized orthography issues by bringing the hypotheses and references into one shared space of orthography, phonology, or semantics. Through the proposed techniques, we achieve higher correlation to human judgment, outperforming the currently widely-used metrics. Finally, we believe that the findings in this thesis, presented linguistic analyses, and collected corpora can help in reaching a better understanding of this code-switched language pair and can contribute towards advancing language technologies to better accommodate for code-switching, with the overall aim of providing better performing language technologies with more human-like communication.
Open Access
Prosodic event detection for speech understanding using neural networks
(2020) Stehwien, Sabrina; Vu, Ngoc Thang (Prof. Dr.)