05 Fakultät Informatik, Elektrotechnik und Informationstechnik
Permanent URI for this collectionhttps://elib.uni-stuttgart.de/handle/11682/6
Browse
4 results
Search Results
Item Open Access Editorial - perspectives for natural language processing between AI, linguistics and cognitive science(2022) Lenci, Alessandro; Padó, SebastianItem Open Access Between welcome culture and border fence : a dataset on the European refugee crisis in German newspaper reports(2023) Blokker, Nico; Blessing, André; Dayanik, Erenay; Kuhn, Jonas; Padó, Sebastian; Lapesa, GabriellaNewspaper reports provide a rich source of information on the unfolding of public debates, which can serve as basis for inquiry in political science. Such debates are often triggered by critical events, which attract public attention and incite the reactions of political actors: crisis sparks the debate. However, due to the challenges of reliable annotation and modeling, few large-scale datasets with high-quality annotation are available. This paper introduces DebateNet2.0 , which traces the political discourse on the 2015 European refugee crisis in the German quality newspaper taz . The core units of our annotation are political claims (requests for specific actions to be taken) and the actors who advance them (politicians, parties, etc.). Our contribution is twofold. First, we document and release DebateNet2.0 along with its companion R package, mardyR . Second, we outline and apply a Discourse Network Analysis (DNA) to DebateNet2.0 , comparing two crucial moments of the policy debate on the “refugee crisis”: the migration flux through the Mediterranean in April/May and the one along the Balkan route in September/October. We guide the reader through the methods involved in constructing a discourse network from a newspaper, demonstrating that there is not one single discourse network for the German migration debate, but multiple ones, depending on the research question through the associated choices regarding political actors, policy fields and time spans.Item Open Access Determinants of grader agreement : an analysis of multiple short answer corpora(2021) Padó, Ulrike; Padó, SebastianThe ’short answer’ question format is a widely used tool in educational assessment, in which students write one to three sentences in response to an open question. The answers are subsequently rated by expert graders. The agreement between these graders is crucial for reliable analysis, both in terms of educational strategies and in terms of developing automatic models for short answer grading (SAG), an active research topic in NLP. This makes it important to understand the properties that influence grader agreement (such as question difficulty, answer length, and answer correctness). However, the twin challenges towards such an understanding are the wide range of SAG corpora in use (which differ along a number of dimensions) and the hierarchical structure of potentially relevant properties (which can be located at the corpus, answer, or question levels). This article uses generalized mixed effects models to analyze the effect of various such properties on grader agreement in six major SAG corpora for two main assessment tasks (language and content assessment). Overall, we find broad agreement among corpora, with a number of properties behaving similarly across corpora (e.g., shorter answers and correct answers are easier to grade). Some properties show more corpus-specific behavior (e.g., the question difficulty level), and some corpora are more in line with general tendencies than others. In sum, we obtain a nuanced picture of how the major short answer grading corpora are similar and dissimilar from which we derive suggestions for corpus development and analysis.Item Open Access Analysis of political debates through newspaper reports : methods and outcomes(2020) Lapesa, Gabriella; Blessing, Andre; Blokker, Nico; Dayanik, Erenay; Haunss, Sebastian; Kuhn, Jonas; Padó, SebastianDiscourse network analysis is an aspiring development in political science which analyzes political debates in terms of bipartite actor/claim networks. It aims at understanding the structure and temporal dynamics of major political debates as instances of politicized democratic decision making. We discuss how such networks can be constructed on the basis of large collections of unstructured text, namely newspaper reports. We sketch a hybrid methodology of manual analysis by domain experts complemented by machine learning and exemplify it on the case study of the German public debate on immigration in the year 2015. The first half of our article sketches the conceptual building blocks of discourse network analysis and demonstrates its application. The second half discusses the potential of the application of NLP methods to support the creation of discourse network datasets.