05 Fakultät Informatik, Elektrotechnik und Informationstechnik

Permanent URI for this collectionhttps://elib.uni-stuttgart.de/handle/11682/6

Browse

Search Results

Now showing 1 - 10 of 58
  • Thumbnail Image
    ItemOpen Access
    Computational modelling of coreference and bridging resolution
    (2019) Rösiger, Ina; Kuhn, Jonas (Prof. Dr.)
  • Thumbnail Image
    ItemOpen Access
    Modeling paths in knowledge graphs for context-aware prediction and explanation of facts
    (2019) Stadelmaier, Josua
    Knowledge bases are an important resource for question answering systems and search engines but often suffer from incompleteness. This work considers the problem of knowledge base completion (KBC). In the context of natural language processing, knowledge bases comprise facts that can be formalized as triples of the form (entity 1, relation, entity 2). A common approach for the KBC problem is to learn representations for entities and relations that allow for generalizing existing connections in the knowledge base to predict the correctness of a triple that is not in the knowledge base. In this work, I propose the context path model, which is based on this approach. In contrast to existing KBC models, it also provides explanations for predictions. For this purpose, it uses paths that capture the context of a given triple. The context path model can be applied on top of several existing KBC models. In a manual evaluation, I observe that most of the paths the model uses as explanation are meaningful and provide evidence for assessing the correctness of triples. I also show in an experiment that the performance of the context path model on a standard KBC task is close to a state of the art model.
  • Thumbnail Image
    ItemOpen Access
  • Thumbnail Image
    ItemOpen Access
    Modeling the interface between morphology and syntax in data-driven dependency parsing
    (2016) Seeker, Wolfgang; Kuhn, Jonas (Prof. Dr.)
    When people formulate sentences in a language, they follow a set of rules specific to that language that defines how words must be put together in order to express the intended meaning. These rules are called the grammar of the language. Languages have essentially two ways of encoding grammatical information: word order or word form. English uses primarily word order to encode different meanings, but many other languages change the form of the words themselves to express their grammatical function in the sentence. These languages are commonly subsumed under the term morphologically rich languages. Parsing is the automatic process for predicting the grammatical structure of a sentence. Since grammatical structure guides the way we understand sentences, parsing is a key component in computer programs that try to automatically understand what people say and write. This dissertation is about parsing and specifically about parsing languages with a rich morphology, which encode grammatical information in the form of words. Today’s parsing models for automatic parsing were developed for English and achieve good results on this language. However, when applied to other languages, a significant drop in performance is usually observed. The standard model for parsing is a pipeline model that separates the parsing process into different steps, in particular it separates the morphological analysis, i.e. the analysis of word forms, from the actual parsing step. This dissertation argues that this separation is one of the reasons for the performance drop of standard parsers when applied to other languages than English. An analysis is presented that exposes the connection between the morphological system of a language and the errors of a standard parsing model. In a second series of experiments, we show that knowledge about the syntactic structure of sentence can support the prediction of morphological information. We then argue for an alternative approach that models morphological analysis and syntactic analysis jointly instead of separating them. We support this argumentation with empirical evidence by implementing two parsers that model the relationship between morphology and syntax in two different but complementary ways.
  • Thumbnail Image
    ItemOpen Access
    Automatische Kategorisierung von Autoren in Bezug auf Arzneimittel in Twitter
    (2016) Xu, MIn
    Mit der rasch wachsenden Popularität von Twitter werden auch immer mehr unterschiedliche Themen diskutiert. Dies lässt sich auch im Bezug auf die Wirkung von Arzneimitteln beobachten. Es ist daher sehr interessant herauszufinden, welche sozialen Gruppen dazu neigen, bestimmte Arzneimittel in Twitter zu diskutieren und welche Arzneimittel am meisten in Twitter diskutiert werden. Deshalb bietet es sich an, mit Verwendung der Technologie der Textklassifikation, die große Anzahl von Tweets zu kategorisieren. In dieser Arbeit wird das hauptsächlich mit dem Maximum Entropy Klassifikator realisiert, mit den sich die Autoren der Tweets erkennen lassen. Da das Maximum Entropy Modell eine Vielzahl der relevanten oder irrelevanten Kenntnis der Wahrscheinlichkeiten umfassend beobachten kann, erzielt der Maximum Entropy Klassifikator im Vergleich zum naiven Bayes-Klassifikator in dieser Arbeit ein besseres Ergebnis bei der Multi-Klassen-Klassifikation. Die Beeinflussung auf die Leistungen des Maximum Entropy Klassifikator unter der Verwendungen von verschiedenen Methoden, wie Information Gain & Mutual Information und LDA-Topic Model, zur Auswahl der Merkmale und unterschiedlicher Anzahl an Merkmalen wird verglichen und analysiert. Die Ergebnissen zeigen, dass die Methoden Information Gain & Mutual Information und LDA-Topic-Model gute praktische Ansätze sind, mit denen die Merkmale kurzer Texte erkannt werden können. Mit dem Maximum Entropy Klassifikator wird eine durchschnittliche Testgenauigkeit von 79.8% erreicht.
  • Thumbnail Image
    ItemOpen Access
    The Impact of intensifiers, diminishers and negations on emotion expressions
    (2017) Strohm, Florian
    There are several areas of application for emotion detection systems, for example social media analysis, for which it is important to reliably recognize expressed emotions. This thesis takes negations, intensifiers and diminishers on emotion expressions in Tweets into account, in order to study whether this can improve an emotion detection system. It uses different emotion classifiers together with various modifier detection approaches to evaluate the impact of modifiers on emotion expressions. The results show that an emotion detection system can be slightly improved if negations are taken into account. The thesis also studies the correlation between modified emotion words and basic emotions to obtain a better understanding about modified emotions. The analysis of the results shows correlations between modified and basic emotions, which enables us to determine the expressed basic emotion of modified emotion words.
  • Thumbnail Image
    ItemOpen Access
    Natural language processing and information retrieval methods for intellectual property analysis
    (2014) Jochim, Charles; Schütze, Hinrich (Prof. Dr.)
    More intellectual property information is generated now than ever before. The accumulation of intellectual property data, further complicated by this continued increase in production, makes it imperative to develop better methods for archiving and more importantly for accessing this information. Information retrieval (IR) is a standard technique used for efficiently accessing information in such large collections. The most prominent example comprising a vast amount of data is the World Wide Web, where current search engines already satisfy user queries by immediately providing an accurate list of relevant documents. However, IR for intellectual property is neither as fast nor as accurate as what we expect from an Internet search engine. In this thesis, we explore how to improve information access in intellectual property collections by combining previously mentioned IR techniques with advanced natural language processing (NLP) techniques. The information in intellectual property is encoded in text (i.e., language), and we expect that by adding better language processing to IR we can better understand and access the data. NLP is a quite varied field encompassing a number of solutions for improving the understanding of language input. We concentrate more specifically on the NLP tasks of statistical machine translation, information extraction, named entity recognition (NER), sentiment analysis, relation extraction, and text classification. Searching for intellectual property, specifically patents, is a difficult retrieval task where standard IR techniques have had only moderate success. The difficulty of this task only increases when presented with multilingual collections as is the case with patents. We present an approach for improving retrieval performance on a multilingual patent collection by using machine translation (an active research area in NLP) to translate patent queries before concatenating these parallel translations into a multilingual query. Even after retrieving an intellectual property document however, we still face the problem of extracting the relevant information needed. We would like to improve our understanding of the complex intellectual property data by uncovering latent information in the text. We do this by identifying citations in a collection of scientific literature and classifying them by their citation function. This classification is successfully carried out by exploiting some characteristics of the citation text, including features extracted via sentiment analysis, NER, and relation extraction. By assigning labels to citations we can better understand the relationships between intellectual property documents, which can be valuable information for IR or other applications.
  • Thumbnail Image
    ItemOpen Access
    Structurally informed methods for improved sentiment analysis
    (2017) Kessler, Stefanie Wiltrud; Kuhn, Jonas (Prof. Dr.)
    Sentiment analysis deals with methods to automatically analyze opinions in natural language texts, e.g., product reviews. Such reviews contain a large number of fine-grained opinions, but to automatically extract detailed information it is necessary to handle a wide variety of verbalizations of opinions. The goal of this thesis is to develop robust structurally informed models for sentiment analysis which address challenges that arise from structurally complex verbalizations of opinions. In this thesis, we look at two examples for such verbalizations that benefit from including structural information into the analysis: negation and comparisons. Negation directly influences the polarity of sentiment expressions, e.g., while "good" is positive, "not good" expresses a negative opinion. We propose a machine learning approach that uses information from dependency parse trees to determine whether a sentiment word is in the scope of a negation expression. Comparisons like "X is better than Y" are the main topic of this thesis. We present a machine learning system for the task of detecting the individual components of comparisons: the anchor or predicate of the comparison, the entities that are compared, which aspect they are compared in, and which entity is preferred. Again, we use structural context from a dependency parse tree to improve the performance of our system. We discuss two ways of addressing the issue of limited availability of training data for our system. First, we create a manually annotated corpus of comparisons in product reviews, the largest such resource available to date. Second, we use the semi-supervised method of structural alignment to expand a small seed set of labeled sentences with similar sentences from a large set of unlabeled sentences. Finally, we work on the task of producing a ranked list of products that complements the isolated prediction of ratings and supports the user in a process of decision making. We demonstrate how we can use the information from comparisons to rank products and evaluate the result against two conceptually different external gold standard rankings.
  • Thumbnail Image
    ItemOpen Access
    Predicting sentiment about places of living
    (2017) Liu, Feifei
    Nowadays studies about the quality of life in major cities are often published in the daily news. These contain ranked list according to the quality of living with indicators representing various aspects. Typical indicators are crime level, transport, health care etc. Along with the flourishing of different social medias, a huge amount of information could be collected from the Internet. Moreover, machine learning as a branch of artificial intelligence becomes more and more prominent. The recent advances in machine learning had found usage in a wide range of applications. One of such application is that of text categorization and sentiment analysis. Relying on these conditions, this thesis aims to create a classifier to predict the sentiment about places of living. In this thesis a ranking list of cities of Mercer is taken use. As a result of the quality of living survey 230 cities of the world are ranked in the list. Text form information of microblogging is chosen as our testbed. Specifically, tweets, microblogging messages from the popular website Twitter, are studied. The tweets chosen for this study are those about cities living standard and contain rich sentiment information. Classification label is assigned to cities under study by their position in the ranking list. After sentiment related features are extracted, machine learning techniques are then applied on the collected tweets. As a result, a classifier with a strong baseline for predicting sentiment about places of living is trained using logistic regression model.
  • Thumbnail Image
    ItemOpen Access
    Interactive exploration and model analysis for coreference annotation
    (2013) Gärtner, Markus
    I present the design and implementation of an interactive visualization- and exploration-framework for coreference annotations. It is designed to meet the needs of multiple different users on a modern and multifaceted graphical exploration tool. To demonstrate its suitability for these various needs I outline several use cases and how the framework can help users in their individual tasks. It offers the user different views on the data with additional functionality to compare several annotations. Complex analysis of annotated corpora is supported by means of a search engine which lets the user construct queries both in a graphical and textual form. Both qualitative and quantitative result breakdowns are available and the implementation features specialized visualizations to aggregate complex search results. The framework is extensible in many ways and can be customized to handle additional data formats.