05 Fakultät Informatik, Elektrotechnik und Informationstechnik
Permanent URI for this collectionhttps://elib.uni-stuttgart.de/handle/11682/6
Browse
204 results
Search Results
Item Open Access Strukturierte Modellierung von Affekt in Text(2020) Klinger, Roman; Padó, Sebastian (Prof. Dr.)Emotionen, Stimmungen und Meinungen sind Affektzustände, welche nicht direkt von einer Person bei anderen Personen beobachtet werden können und somit als „privat“ angesehen werden können. Um diese individuellen Gefühlsregungen und Ansichten dennoch zu erraten, sind wir in der alltäglichen Kommunikation gewohnt, Gesichtsausdrücke, Körperposen, Prosodie, und Redeinhalte zu interpretieren. Das Forschungsgebiet Affective Computing und die spezielleren Felder Emotionsanalyse und Sentimentanalyse entwickeln komputationelle Modelle, mit denen solche Abschätzungen automatisch möglich werden. Diese Habilitationsschrift fällt in den Bereich des Affective Computings und liefert in diesem Feld Beiträge zur Betrachtung und Modellierung von Sentiment und Emotion in textuellen Beschreibungen. Wir behandeln hier unter anderem Literatur, soziale Medien und Produktbeurteilungen. Um angemessene Modelle für die jeweiligen Phänomene zu finden, gehen wir jeweils so vor, dass wir ein Korpus als Basis nutzen oder erstellen und damit bereits Hypothesen über die Formulierung des Modells treffen. Diese Hypothesen können dann auf verschiedenen Wegen untersucht werden, erstens, durch eine Analyse der Übereinstimmung der Annotatorinnen, zweitens, durch eine Adjudikation der Annotatorinnen gefolgt von einer komputationellen Modellierung, und drittens, durch eine qualitative Analyse der problematischen Fälle. Wir diskutieren hier Sentiment und Emotion zunächst als Klassifikationsproblem. Für einige Fragestellungen ist dies allerdings nicht ausreichend, so dass wir strukturierte Modelle vorschlagen, welche auch Aspekte und Ursachen des jeweiligen Gefühls beziehungsweise der Meinung extrahieren. In Fällen der Emotion extrahieren wir zusätzlich Nennungen des Fühlenden. In einem weiteren Schritt werden die Verfahren so erweitert, dass sie auch auf Sprachen angewendet werden können, welche nicht über ausreichende annotierte Ressourcen verfügen. Die Beiträge der Habilitationsarbeit sind also verschiedene Ressourcen, für deren Erstellung auch zugrundeliegende Konzeptionsarbeit notwendig war. Wir tragen deutsche und englische Korpora für aspektbasierte Sentimentanalyse, Emotionsklassifikation und strukturierte Emotionsanalyse bei. Des Weiteren schlagen wir Modelle für die automatische Erkennung und Repräsentation von Sentiment, Emotion und verwandten Konzepten vor. Diese zeigen entweder bessere Ergebnisse, als bisherige Verfahren oder modellieren Phänomene erstmalig. Letzteres gilt insbesondere bei solchen Methoden, welche auf durch uns erstellte Korpora ermöglicht wurden. In den verschiedenen Ansätzen werden wiederkehrend Konzepte gemeinsam modelliert, sei es auf der Repräsentations- oder der Inferenzebene. Solche Verfahren, welche Entscheidungen im Kontext treffen, zeigen in unserer Arbeit durchgängig bessere Ergebnisse, als solche, welche Phänomene getrennt betrachten. Dies gilt sowohl für den Einsatz künstlicher neuronaler Netze, als auch für die Verwendung probabilistischer graphischer Modelle.Item Open Access Task-oriented specialization techniques for entity retrieval(2020) Glaser, Andrea; Kuhn, Jonas (Prof. Dr.)Finding information on the internet has become very important nowadays, and online encyclopedias or websites specialized in certain topics offer users a great amount of information. Search engines support users when trying to find information. However, the vast amount of information makes it difficult to separate relevant from irrelevant facts for a specific information need. In this thesis we explore two areas of natural language processing in the context of retrieving information about entities: named entity disambiguation and sentiment analysis. The goal of this thesis is to use methods from these areas to develop task-oriented specialization techniques for entity retrieval. Named entity disambiguation is concerned with linking referring expressions (e.g., proper names) in text to their corresponding real world or fictional entity. Identifying the correct entity is an important factor in finding information on the internet as many proper names are ambiguous and need to be disambiguated to find relevant information. To that end, we introduce the notion of r-context, a new type of structurally informed context. This r-context consists of sentences that are relevant to the entity only to capture all important context clues and to avoid noise. We then show the usefulness of this r-context by performing a systematic study on a pseudo-ambiguity dataset. Identifying less known named entities is a challenge in named entity disambiguation because usually there is not much data available from which a machine learning algorithm can learn. We propose an approach that uses an aggregate of textual data about other entities which share certain properties with the target entity, and learn information from it by using topic modelling, which is then used to disambiguate the less known target entity. We use a dataset that is created automatically by exploiting the link structure in Wikipedia, and show that our approach is helpful for disambiguating entities without training material and with little surrounding context. Retrieving the relevant entities and information can produce many search results. Thus, it is important to effectively present the information to a user. We regard this step beyond the entity retrieval and employ sentiment analysis, which is used to analyze opinions expressed in text, in the context of effectively displaying information about product reviews to a user. We present a system that extracts a supporting sentence, a single sentence that captures both the sentiment of the author as well as a supportingfact. This supporting sentence can be used to provide users with an easy way to assess information in order to make informed choices quickly. We evaluate our approach by using the crowdsourcing service Amazon Mechanical Turk.Item Open Access German clause-embedding predicates : an extraction and classification approach(2010) Lapshinova-Koltunski, Ekaterina; Heid, Ulrich (Prof. Dr. phil. habil.)This thesis describes a semi-automatic approach to the analysis of subcategorisation properties of verbal, nominal and multiword predicates in German. We semi-automatically classify predicates according to their subcategorisation properties by means of extracting them from German corpora along with their complements. In this work, we concentrate exclusively on sentential complements, such as dass, ob and w-clauses, although our methods can be also applied for other complement types. Our aim is not only to extract and classify predicates but also to compare subcategorisation properties of morphologically related predicates, such as verbs and their nominalisations. It is usually assumed that subcategorisation properties of nominalisations are taken over from their underlying verbs. However, our tests show that there exist different types of relations between them. Thus, we review subcategorisation properties of morphologically related words and analyse their correspondences and differences. For this purpose, we elaborate a set of semi-automatic procedures, which allow us not only to classify extracted units according to their subcategorisation properties, but also to compare the properties of verbs and their nominalisations, which occur both freely in corpora and within a multiword expression. The lexical data are created to serve symbolic NLP, especially large symbolic grammars for deep processing, such as HPSG or LFG, cf. work in the LinGO project (Copestake et al. 2004) and the Pargram project (Butt et al. 2002). HPSG and LFG need detailed linguistic knowledge. Besides that, subcategorisation iformation can be applied in applications for IE, cf. (Surdeanu et al. 2003). Moreover, this information is necessary for linguistic, lexicographic, SLA and translation work. Our extraction and classification procedures are precision-oriented, which means that we focus on high accuracy of our extraction and classification results. High precision is opposed to completeness, which is compensated by the application of extraction procedures on larger corpora.Item Open Access The perfect time span : on the present perfect in German, Swedish and English(2006) Rothstein, Björn Michael; Kamp, Hans (Prof. Dr. h.c. PhD)This study proposes a discourse based approach to the present perfect in German, Swedish and English. It is argued that the present perfect is best analysed by applying an ExtendedNow-approach. It introduces a perfect time span in which the event time expressed by the present perfect is contained. The present perfects in these languages differ with respect to the boundaries of perfect time span. In English, the right boundary is identical to the point of speech, in Swedish it can be either at or after the moment of speech and in German it can also be before the moment of speech. The left boundary is unspecified. The right boundary is set by context.Item Open Access Segmental factors in language proficiency : degree of velarization, coarticulatory resistance and vowel formant frequency distribution as a signature of talent(2011) Baumotte, Henrike; Dogil, Grzegorz (Prof. Dr.)The present PhD proposes a reason for German native speakers of various proficiency levels and multiple English varieties producing their L2 English with different degrees of a foreign accent. The author took into account phonetic measurements to investigate the degree of velarization and coarticulation or coarticulatory resistance respectively in German and English, taking non-words and natural language stimuli. To get an impression of the differences between the productions of proficient, average and less proficient speakers in German and English, the mean F2 and Fv values in /ə/ before /l/ and in /l/ were calculated, for then comparing the degree of velarization in /əlV/ non-word sequences with each other. Proficient speakers gained lower formant frequencies for F2 and Fv in /ə/ than less proficient speakers, i.e. proficient speakers velarized more than less proficient speakers. Within the comparisons with respect to coarticulation or coarticulatory resistance results respectively the difference values for F2 and F2' out of /ə/ in /əleɪ/ vs. /əlu:/, /əly/ vs. /əleɪ/ and /əly/ vs. /əlaɪ/ were created. In the whole series of measurements, an overwhelming trend for proficient speakers being more coarticulatory resistant, i.e. velarizing more, and more precisely pronouncing English vowel characteristics than less proficient speakers was present, while average speakers did not continuously behave according to prediction, as a result of being sometimes “worse” than less proficient speakers. On the basis of Díaz et al. (2008) who pled for pre-existing individual differences in phonetic discrimination ability which enormously influence the achievement of a foreign sound system, it is claimed for a derivation of foreign language from native phonetic abilities.Item Open Access Computational modelling of coreference and bridging resolution(2019) Rösiger, Ina; Kuhn, Jonas (Prof. Dr.)Item Open Access Analysis of political positioning from politician’s tweets(2023) Maurer, Maximilian MartinSocial media platforms such as Twitter have become important communication channels for politicians to interact with the electorate and communicate their stances on policy issues. In contrast to party manifestos, which lay out curated, compromised positions, the full range of positions within the ideological bounds of a party can be found on social media. This begs the question of how aligned the ideological positions of parties on social media are with their respective manifesto. To assess the alignment of social media and manifesto positions, we correlate the positions automatically retrieved from the tweets with manifesto-based positions for the German federal elections of 2017 and 2021. Additionally, we assess whether the change in positions over time is aligned between social media and manifestos. We retrieve ideological positions by aggregating distances between parties from sentence representations of their members' tweets from a corpus containing >2M individual tweets of 421 German politicians. We leverage domain-specific information by training a sentence embedding model such that representations of tweets with co-occurring hashtags are closer to each other than ones without co-occurring hashtags, following the assumption that hashtags approximate policy-related topics. Our experiments compare this political social media domain-specific model with other political domain and general domain sentence embedding models. We find high, significant correlations between the Twitter-retrieved positions and manifesto positions, especially for our domain-specific fine-tuned model. Moreover, for this model, we find overlaps in terms of how the positions change over time. These results indicate that the ideological positions of parties on Twitter correspond to the ideological positions as laid out in the manifestos to a large extent.Item Open Access Effects of paraphrasing and demographic metadata on NLI classification performance(2023) Marx Larre, MiguelNative language identification (NLI) refers to the task of automatically deducing the native language (L1) of a document's author, when the document is written in a second language (L2). Documents stem from different sources, but recently more documents are altered before publication through paraphrasing methods. This alteration changes the content, grammar, and style of the document, which inherently obfuscates the L1 of the author. In addition, the demographic metadata of the author, such as age and gender, may influence the performance with which an author's L1 may be detected. In this thesis, two corpora which provide necessary demographic metadata, the International Corpus of Learner English (ICLE) and the \textsc{Trustpilot} corpus, are used to analyze the impact of paraphrasing and demographic factors in the context of NLI tasks. To analyze the effect of paraphrasing on a document, new versions of both corpora are created, which contain paraphrased versions of the documents contained. The effect is inspected using two state-of-the-art NLI systems to perform the task, while the results were analyzed using a regression analysis in combination with dominance analysis (DA). Paraphrasing was found to have a substantial influence in performance of NLI tasks, regardless of corpus, classifier, or paraphrasing method. The usual influence of demographic factors on NLI tasks could not be confirmed in this thesis. Regression analysis and DA allowed for a more profound analysis of the results, which allowed for findings regarding the influence of specific L1s on performance of NLI tasks.Item Open Access Emotion classification based on the emotion component model(2020) Heindl, AmelieThe term emotion is, despite its frequent use, still mysterious to researchers. This poses difficulties on the task of automatic emotion detection in text. At the same time, applications for emotion classifiers increase steadily in today's digital society where humans are constantly interacting with machines. Hence, the need for improvement of current state-of-the-art emotion classifiers arises. The Swiss psychologist Klaus Scherer published an emotion model according to which an emotion is composed of changes in the five components cognitive appraisal, physiological symptoms, action tendencies, motor expressions, and subjective feelings. This model, which he calls CPM gained reputation in psychology and philosophy, but has so far not been used for NLP tasks. With this work, we investigate, whether it is possible to automatically detect the CPM components in social media posts and, whether information on those components can aid the detection of emotions. We create a text corpus consisting of 2100 Twitter posts, that has every instance labeled with exactly one emotion and a binary label for each CPM component. With a Maximum Entropy classifier we manage to detect CPM components with an average F1-score of 0.56 and average accuracy of 0.82 on this corpus. Furthermore, we compare baseline versions of one Maximum Entropy and one CNN emotion classifier to extensions of those classifiers with the CPM annotations and predictions as additional features. We find slight performance increases of up to 0.03 for the F1-score for emotion detection upon incorporation of CPM information.Item Open Access Question answering on knowledge bases : A comparative study(2021) Kanjur, VishnudathaQuestion Answering intends to automatically extract accurate and relevant information as the answer to a particular question. A large amount of data from the Web is stored as Knowledge bases in a structured way. Question answering on Knowledge bases is a research field that involves multiple branches of computer science like natural language processing, information retrieval and artificial intelligence. Knowledge Base Question Answering (KBQA) research involves various challenges to be solved in multiple aspects. This thesis aimed to compare several state-of-the-art methods for single relation KBQA. The widely used standard single relation dataset, SimpleQuestions dataset was used in the study against Freebase Knowledge Base (KB). A comprehensive analysis of the underlying models and their architecture was performed. Furthermore, to identify the drawbacks and possible enhancements, several approaches for evaluating the models were explored. The results show how the models were performed and the suitability of considering them for solving real-world problems in question answering.