05 Fakultät Informatik, Elektrotechnik und Informationstechnik

Permanent URI for this collectionhttps://elib.uni-stuttgart.de/handle/11682/6

Browse

Search Results

Now showing 1 - 8 of 8
  • Thumbnail Image
    ItemOpen Access
    Predicting sentiment about places of living
    (2017) Liu, Feifei
    Nowadays studies about the quality of life in major cities are often published in the daily news. These contain ranked list according to the quality of living with indicators representing various aspects. Typical indicators are crime level, transport, health care etc. Along with the flourishing of different social medias, a huge amount of information could be collected from the Internet. Moreover, machine learning as a branch of artificial intelligence becomes more and more prominent. The recent advances in machine learning had found usage in a wide range of applications. One of such application is that of text categorization and sentiment analysis. Relying on these conditions, this thesis aims to create a classifier to predict the sentiment about places of living. In this thesis a ranking list of cities of Mercer is taken use. As a result of the quality of living survey 230 cities of the world are ranked in the list. Text form information of microblogging is chosen as our testbed. Specifically, tweets, microblogging messages from the popular website Twitter, are studied. The tweets chosen for this study are those about cities living standard and contain rich sentiment information. Classification label is assigned to cities under study by their position in the ranking list. After sentiment related features are extracted, machine learning techniques are then applied on the collected tweets. As a result, a classifier with a strong baseline for predicting sentiment about places of living is trained using logistic regression model.
  • Thumbnail Image
    ItemOpen Access
    Interactive exploration and model analysis for coreference annotation
    (2013) Gärtner, Markus
    I present the design and implementation of an interactive visualization- and exploration-framework for coreference annotations. It is designed to meet the needs of multiple different users on a modern and multifaceted graphical exploration tool. To demonstrate its suitability for these various needs I outline several use cases and how the framework can help users in their individual tasks. It offers the user different views on the data with additional functionality to compare several annotations. Complex analysis of annotated corpora is supported by means of a search engine which lets the user construct queries both in a graphical and textual form. Both qualitative and quantitative result breakdowns are available and the implementation features specialized visualizations to aggregate complex search results. The framework is extensible in many ways and can be customized to handle additional data formats.
  • Thumbnail Image
    ItemOpen Access
    Konzept und Entwicklung eines Werkzeugs zur automatisierten Übersetzung natürlichsprachlicher Anforderungen
    (2012) Siegmund, Nadine
    Zum erfolgreichen Projektabschluss gehört im automobilen Umfeld die Erstellung einer Anforderungsdokumentation. Diese sollte wegen der immer stärkeren internationalen Zusammenarbeit und verteilten Entwicklung am besten in mehreren Sprachen und vor allem in Englisch zur Verfügung stehen. Da aber nicht jeder Projektbeteiligte aller Sprachen mächtig ist, muss das Anforderungsdokument übersetzt werden. Die maschinelle Übersetzung bietet sich hierfür an, da dadurch standardisierte Anforderungsdokumente erzeugt werden können und im Gegenteil zu einem Übersetzer die Übersetzung günstig ist und ohne Verzögerung zur Verfügung steht. In dieser Arbeit wird das Konzept und die Implementierung einer Methodik, die in einem Werkzeug umgesetzt wird, vorgestellt. Damit können Anforderungen mittels maschineller Übersetzung von Deutsch nach Englisch übersetzt werden. Es wird dazu ein transferbasiertes Verfahren angewandt, das einen Satz mittels einer mit Merkmalsstrukturen und Restriktionen angereicherten Phrasenstrukturgrammatik analysiert und in eine Strukturbeschreibung überführt. Die Strukturbeschreibung wird mit Hilfe von Transferregeln vom Deutschen ins Englische überführt und daraus der übersetzte Satz generiert. Das Werkzeug wurde, zum Beispiel durch die Verwendung von XML, an möglichst vielen Stellen erweiterbar gestaltet, um eine einfache Weiterentwicklung durch Experten, wie zum Beispiel Computerlinguisten, zu ermöglichen. Zur Vereinfachung der Problematik einer maschinellen Übersetzung wird die Tatsache ausgenutzt, dass die Anforderungen mittels einer Schablone formuliert werden und dadurch die Satzstruktur stark eingeschränkt wird. Außerdem liegt ein Lexikon mit einer Subsprache vor, durch das weitere Probleme eliminiert werden können. Diese Arbeit zeigt, dass die maschinelle Übersetzung unter den gegebenen Voraussetzungen ein geeigneter Ansatz für die Übersetzung von Anforderungen ist.
  • Thumbnail Image
    ItemOpen Access
    Language identification for German-Turkish code-switching speech
    (2017) Köstak, Ugur
    The importance of computers has risen in recent years in our daily lives. An average person interacts without a doubt multiple times with computers. The wide usage of computers has caused researchers to think of ways which would allow you to communicate with computers by a minimum number of interactions. Speech is the main communication instrument for humans, so researchers also used speech as an interaction method between humans and computers. However, speech has boundaries of its own, the language varies in different societies, especially in multicultural societies where people tend to use a mixed language called Code-Switching language to communicate, i.e. Germany is a multicultural country and foreigners, especially bilingual Turkish people, use German and Turkish when they speak to each other. On the other hand, computers nowadays have become more powerful and can also process complex tasks such as NLP tasks, which requires a lot of processing power. In this thesis we aimed to solve Language Identification task in German-Turkish code-switching speeches with two popular machine learning methods Support Vector Machines and Deep Neural Networks and at the end we compared the performances of these methods.
  • Thumbnail Image
    ItemOpen Access
    Evaluation of automated business process optimization
    (2011) Ergin, Kemal Tolga
    Today's highly competitive markets tend to favor enterprises, in which business processes are analyzed and optimized regularly, in order to be able to operate in accordance with their business goals. The variety of business process management (BPM) methods applied for this purpose, since the emergence of the concept of business reengineering in the 1990s, ranges from incremental adjustments to radical restructuring. In combination with contemporary workflow automation technology, modern redesign methods are powerful tools for enhancing business performance, enabling companies to maintain a winning margin. Optimization methods that deliver sustainable results using evolutionary approaches, however, are nowadays becoming increasingly popular - once again, two decades after continuous improvement paradigms had almost completely been abandoned in favor of revolutionary process redesign. This diploma thesis explores one such evolutionary BPM approach employed in the deep Business Optimization Platform (dBOP), a research prototype, which assists analysts with the selection and application of suitable process improvement techniques. The present work demonstrates an evaluation of dBOP with the help of simulated business scenarios based on real case studies, and documents the types of optimization patterns most readily applied through automated process redesign. For this purpose two business processes, one from a car rental enterprise and one from a health insurance company, are modeled and deployed on a process server, and executed using web services and sample data warehouses based on actual statistics. These processes are then analyzed with dBOP, in order to compare its optimization recommendations with those expected from a human analyst's perspective.
  • Thumbnail Image
    ItemOpen Access
    Analysing names of organic chemical compounds : from morpho-semantics to SMILES strings and classes
    (2005) Anstein, Stefanie; Kremer, Gerhard
    The linguistic analysis of chemical terminology is a key to biochemical text processing and semi-automatic database curation. The system described analyses systematic and semi-systematic names of chemical compounds, class terms, and also otherwise underspecified names by means of a morpho-semantic grammar developed according to IUPAC nomenclature. It yields an intermediate semantic representation which describes the information encoded in a name. Our tool provides SMILES strings for the mapping of names to their molecule structure and also classifies the analysed terms. It was implemented in Prolog as a prototype and a basis for further development to support research in the life sciences.
  • Thumbnail Image
    ItemOpen Access
    Towards robust cross-domain domain adaptation for part-of-speech tagging
    (2013) Schnabel, Tobias
    Most systems in natural language processing experience a substantial loss in performance when the data that the system is tested with differs significantly from the data that the system has been trained on. Systems for part-of-speech (POS) tagging, for example, are typically trained on newspaper texts but are often applied to texts of other domains such as medical texts. Domain adaptation (DA) techniques seek to improve such systems so that they are able to achieve consistently good performance - independent of the domains at hand. We investigate the robustness of domain adaptation representations and methods across target domains using part-of-speech tagging as a case study. We find that there is no single representation and method that works equally well for all target domains. In particular, there are large differences between target domains that are more similar to the source domain and those that are less similar.
  • Thumbnail Image
    ItemOpen Access
    Graphical error mining for linguistic annotated corpora
    (2013) Thiele, Gregor
    Corpora contain linguistically annotated data. Producing these annotations is a complex process that easily leads to inconsistencies within the annotation. Since corpora are used to evaluate automatic language processing systems the evaluation may suffer when there are too many errors within the data. This thesis focuses on finding erroneous annotations within corpora. To detect sequence annotation errors within part-of-speech tags we implemented the algorithm introduced by Dickinson and Meurers (2003). Additionally for structured annotations we choose the approach shown in Boyd et al.(2008) that targets inconsistency within dependency structures. We designed and built a graphical user interface (GUI) that is easy to handle and user-friendly. Implementing state-of-the-art algorithms for error detection with an user-friendly interface increase the operation domain because the algorithms can be used by a wider audience without deeper knowledge of computers. It provides even non-expert users with the capability to find inconsistent pos tags and dependency structures within a corpus. We evaluate the system using the German TIGER corpus and the English Penn Treebank. For the TIGER corpus we also perform a manual evaluation where we sample 115 6-grams and check manually if these contain errors. We find that 94.96% are erroneous and it is easy to decide the correct tag as a human. For 4.20% we can say that these are errors but determining the correct tag is very to difficult. In total we detect errors with a precision of 99.16%. Only one case (0.84%) is not caused by inconsistency but constitutes genuine ambiguity.