05 Fakultät Informatik, Elektrotechnik und Informationstechnik
Permanent URI for this collectionhttps://elib.uni-stuttgart.de/handle/11682/6
Browse
12 results
Search Results
Item Open Access Cross-lingual citations in English papers : a large-scale analysis of prevalence, usage, and impact(2021) Saier, Tarek; Färber, Michael; Tsereteli, TornikeCitation information in scholarly data is an important source of insight into the reception of publications and the scholarly discourse. Outcomes of citation analyses and the applicability of citation-based machine learning approaches heavily depend on the completeness of such data. One particular shortcoming of scholarly data nowadays is that non-English publications are often not included in data sets, or that language metadata is not available. Because of this, citations between publications of differing languages (cross-lingual citations) have only been studied to a very limited degree. In this paper, we present an analysis of cross-lingual citations based on over one million English papers, spanning three scientific disciplines and a time span of three decades. Our investigation covers differences between cited languages and disciplines, trends over time, and the usage characteristics as well as impact of cross-lingual citations. Among our findings are an increasing rate of citations to publications written in Chinese, citations being primarily to local non-English languages, and consistency in citation intent between cross- and monolingual citations. To facilitate further research, we make our collected data and source code publicly available.Item Open Access Advances in clinical voice quality analysis with VOXplot(2023) Barsties von Latoszek, Ben; Mayer, Jörg; Watts, Christopher R.; Lehnert, BernhardBackground: The assessment of voice quality can be evaluated perceptually with standard clinical practice, also including acoustic evaluation of digital voice recordings to validate and further interpret perceptual judgments. The goal of the present study was to determine the strongest acoustic voice quality parameters for perceived hoarseness and breathiness when analyzing the sustained vowel [a:] using a new clinical acoustic tool, the VOXplot software. Methods: A total of 218 voice samples of individuals with and without voice disorders were applied to perceptual and acoustic analyses. Overall, 13 single acoustic parameters were included to determine validity aspects in relation to perceptions of hoarseness and breathiness. Results: Four single acoustic measures could be clearly associated with perceptions of hoarseness or breathiness. For hoarseness, the harmonics-to-noise ratio (HNR) and pitch perturbation quotient with a smoothing factor of five periods (PPQ5), and, for breathiness, the smoothed cepstral peak prominence (CPPS) and the glottal-to-noise excitation ratio (GNE) were shown to be highly valid, with a significant difference being demonstrated for each of the other perceptual voice quality aspects. Conclusions: Two acoustic measures, the HNR and the PPQ5, were both strongly associated with perceptions of hoarseness and were able to discriminate hoarseness from breathiness with good confidence. Two other acoustic measures, the CPPS and the GNE, were both strongly associated with perceptions of breathiness and were able to discriminate breathiness from hoarseness with good confidence.Item Open Access Resources for Turkish natural language processing : a critical survey(2022) Çöltekin, Çağrı; Doğruöz, A. Seza; Çetinoğlu, ÖzlemThis paper presents a comprehensive survey of corpora and lexical resources available for Turkish. We review a broad range of resources, focusing on the ones that are publicly available. In addition to providing information about the available linguistic resources, we present a set of recommendations, and identify gaps in the data available for conducting research and building applications in Turkish Linguistics and Natural Language Processing.Item Open Access Editorial - perspectives for natural language processing between AI, linguistics and cognitive science(2022) Lenci, Alessandro; Padó, SebastianItem Open Access Between welcome culture and border fence : a dataset on the European refugee crisis in German newspaper reports(2023) Blokker, Nico; Blessing, André; Dayanik, Erenay; Kuhn, Jonas; Padó, Sebastian; Lapesa, GabriellaNewspaper reports provide a rich source of information on the unfolding of public debates, which can serve as basis for inquiry in political science. Such debates are often triggered by critical events, which attract public attention and incite the reactions of political actors: crisis sparks the debate. However, due to the challenges of reliable annotation and modeling, few large-scale datasets with high-quality annotation are available. This paper introduces DebateNet2.0 , which traces the political discourse on the 2015 European refugee crisis in the German quality newspaper taz . The core units of our annotation are political claims (requests for specific actions to be taken) and the actors who advance them (politicians, parties, etc.). Our contribution is twofold. First, we document and release DebateNet2.0 along with its companion R package, mardyR . Second, we outline and apply a Discourse Network Analysis (DNA) to DebateNet2.0 , comparing two crucial moments of the policy debate on the “refugee crisis”: the migration flux through the Mediterranean in April/May and the one along the Balkan route in September/October. We guide the reader through the methods involved in constructing a discourse network from a newspaper, demonstrating that there is not one single discourse network for the German migration debate, but multiple ones, depending on the research question through the associated choices regarding political actors, policy fields and time spans.Item Open Access Analysis of political debates through newspaper reports : methods and outcomes(2020) Lapesa, Gabriella; Blessing, Andre; Blokker, Nico; Dayanik, Erenay; Haunss, Sebastian; Kuhn, Jonas; Padó, SebastianDiscourse network analysis is an aspiring development in political science which analyzes political debates in terms of bipartite actor/claim networks. It aims at understanding the structure and temporal dynamics of major political debates as instances of politicized democratic decision making. We discuss how such networks can be constructed on the basis of large collections of unstructured text, namely newspaper reports. We sketch a hybrid methodology of manual analysis by domain experts complemented by machine learning and exemplify it on the case study of the German public debate on immigration in the year 2015. The first half of our article sketches the conceptual building blocks of discourse network analysis and demonstrates its application. The second half discusses the potential of the application of NLP methods to support the creation of discourse network datasets.Item Open Access AmericasNLI : machine translation and natural language inference systems for Indigenous languages of the Americas(2022) Kann, Katharina; Ebrahimi, Abteen; Mager, Manuel; Oncevay, Arturo; Ortega, John E.; Rios, Annette; Fan, Angela; Gutierrez-Vasques, Ximena; Chiruzzo, Luis; Giménez-Lugo, Gustavo A.; Ramos, Ricardo; Meza Ruiz, Ivan Vladimir; Mager, Elisabeth; Chaudhary, Vishrav; Neubig, Graham; Palmer, Alexis; Coto-Solano, Rolando; Vu, Ngoc ThangLittle attention has been paid to the development of human language technology for truly low-resource languages - i.e., languages with limited amounts of digitally available text data, such as Indigenous languages. However, it has been shown that pretrained multilingual models are able to perform crosslingual transfer in a zero-shot setting even for low-resource languages which are unseen during pretraining. Yet, prior work evaluating performance on unseen languages has largely been limited to shallow token-level tasks. It remains unclear if zero-shot learning of deeper semantic tasks is possible for unseen languages. To explore this question, we present AmericasNLI, a natural language inference dataset covering 10 Indigenous languages of the Americas. We conduct experiments with pretrained models, exploring zero-shot learning in combination with model adaptation. Furthermore, as AmericasNLI is a multiway parallel dataset, we use it to benchmark the performance of different machine translation models for those languages. Finally, using a standard transformer model, we explore translation-based approaches for natural language inference. We find that the zero-shot performance of pretrained models without adaptation is poor for all languages in AmericasNLI, but model adaptation via continued pretraining results in improvements. All machine translation models are rather weak, but, surprisingly, translation-based approaches to natural language inference outperform all other models on that task.Item Open Access Text-und Data-Mining : urheberrechtliche Grenzen der Nachnutzung wissenschaftlicher Korpora und ihre Bedeutung für die Digital Humanities(2021) Kleinkopf, Felicitas; Jacke, Janina; Gärtner, MarkusItem Open Access Bootstrap co-occurrence networks of consonants and the Basic Consonant Inventory(2023) Nikolaev, DmitryIt has been recently shown by Nikolaev and Grossman that it is possible to provide a fine-grained typological analysis of consonant inventories of the world’s languages by investigating co-occurrence classes of segments, i.e. groups of segments that tend to be found together in inventories. Nikolaev and Grossman argued that the structure of many of such co-occurrence classes is in contradiction with the Feature-Economy Principle. As a side product of this analysis, a new definition of the Basic Consonant Inventory (BCI) - a cluster of segments forming the bedrock of consonantal inventories of the world’s languages - was provided. This paper replicates the co-occurrence study in an arguably more robust way. In addition to making a methodological contribution, it shows that some of the co-occurrence classes defined by Nikolaev and Grossman, including the BCI, are not statistically stable and may be an artefact of the imbalance in the language sample used for the analysis. The findings of the authors regarding the Feature-Economy Principle, however, were corroborated.Item Open Access How to do human evaluation : a brief introduction to user studies in NLP(2023) Schuff, Hendrik; Vanderlyn, Lindsey; Adel, Heike; Vu, Ngoc Thang