05 Fakultät Informatik, Elektrotechnik und Informationstechnik

Now showing 1 - 3 of 3

Open Access
CUTE - CRETA Un-/Shared Task zu Entitätenreferenzen
(2017) Reiter, Nils; Blessing, André; Echelmeyer, Nora; Kremer, Gerhard; Koch, Steffen; Murr, Sandra; Overbeck, Maximilian; Pichler, Axel
Dies ist die Veröffentlichung eines shared/unshared Task Workshops (entwickelt in CRETA: Center for Reflected Text Analytics), der im Rahmen der DHd 2017 in Bern (CH) stattfand. Im Gegensatz zu shared tasks, bei denen die Performanz verschiedener Systeme/Ansätze/Methoden direkt anhand einer klar definierten und quantitativ evaluierten Aufgabe verglichen wird, sind unshared tasks offen für verschiedenartige Beiträge, die auf einer gemeinsamen Datensammlung basieren. Shared und Unshared Tasks in den Digital Humanities sind ein vielversprechender Weg, Kollaboration und Interaktion zwischen Geistes-, Sozial- und ComputerwissenschaftlerInnen zu fördern und zu pflegen. Konkret riefen wir dazu auf, gemeinsam an einem heterogenen Korpus zu arbeiten, in dem Entitätenreferenzen annotiert wurden. Das Korpus besteht aus Parlamentsdebatten des Deutschen Bundestags, Briefen aus Goethes Die Leiden des jungen Werther, einem Abschnitt aus Adornos Ästhetischer Theorie und den Büchern von Wolframs von Eschenbach Parzival (mittelhochdeutsch). Auch wenn jede Textsorte ihre eigenen Besonderheiten hat, wurden alle nach einheitlichen Annotationsrichtlinien annotiert, die wir auch zur Diskussion stellten. Wir veröffentlichen hier den Aufruf zu Workshop-Beiträgen, die Annotationsrichtlinien, die Korpusdaten samt Beschreibung und die einführenden Vortragsfolien des Workshops.
Open Access
Between welcome culture and border fence : a dataset on the European refugee crisis in German newspaper reports
(2023) Blokker, Nico; Blessing, André; Dayanik, Erenay; Kuhn, Jonas; Padó, Sebastian; Lapesa, Gabriella
Newspaper reports provide a rich source of information on the unfolding of public debates, which can serve as basis for inquiry in political science. Such debates are often triggered by critical events, which attract public attention and incite the reactions of political actors: crisis sparks the debate. However, due to the challenges of reliable annotation and modeling, few large-scale datasets with high-quality annotation are available. This paper introduces DebateNet2.0 , which traces the political discourse on the 2015 European refugee crisis in the German quality newspaper taz . The core units of our annotation are political claims (requests for specific actions to be taken) and the actors who advance them (politicians, parties, etc.). Our contribution is twofold. First, we document and release DebateNet2.0 along with its companion R package, mardyR . Second, we outline and apply a Discourse Network Analysis (DNA) to DebateNet2.0 , comparing two crucial moments of the policy debate on the “refugee crisis”: the migration flux through the Mediterranean in April/May and the one along the Balkan route in September/October. We guide the reader through the methods involved in constructing a discourse network from a newspaper, demonstrating that there is not one single discourse network for the German migration debate, but multiple ones, depending on the research question through the associated choices regarding political actors, policy fields and time spans.
Open Access
Information extraction for the geospatial domain
(2014) Blessing, André; Schütze, Hinrich (Prof. Dr.)
Geospatial knowledge is increasingly becoming an essential part of software applications. This is primarily due to the importance of mobile devices and of location-based queries on the World Wide Web. Context models are one way to disseminate geospatial data in a digital and machine-readable representation. One key challenge involves acquiring and updating such data, since physical sensors cannot be used to collect such data on a large scale. Doing the required manual work is very time-consuming and expensive. Alternatively, a lot of geospatial data already exists in a textual representation, and this can instead be used. The question is how to extract such information from texts in order to integrate it into context models. In this thesis we tackle this issue and provide new approaches which were implemented as prototypes and evaluated. The first challenge in analyzing geospatial data in texts is identifying geospatial entities, which are also called toponyms. Such an approach can be divided into several steps. The first step marks possible candidates in the text, which is called spotting. Gazetteers are the key component for that task but they have to be augmented by linguistically motivated methods to enable the spotting of inflected names. A second step is needed, since the spotting process cannot resolve ambiguous entities. For instance, London can be a city or a surname; we call this a geo/non-geo ambiguity. There are also geo/geo ambiguities, e.g. Fulda (city) vs. Fulda (river). For our experiments, we prepared a new dataset that contains mentions of street names. Each mention was manually annotated and one part of the data was used to develop methods for toponym recognition and the remaining part was used to evaluate performance. The results showed that machine learning based classifiers perform well for resolving the geo/non-geo ambiguity. To tackle the geo/geo ambiguity we have to ground toponyms by finding the corresponding real world objects. In this work we present such approaches in a formal description and in a (partial) prototypical implementation, e.g., the recognition of vernacular named regions (like old town or financial district). The lack of annotated data in the geospatial domain is a major obstacle for the development of supervised extraction approaches. The second part of this thesis thus focuses on approaches that enable the automatic annotation of textual data, which we call unstructured data, by using machine-readable data from a knowledge base, which we call structured data. This approach is an instance of distant supervision (DS). It is well established for the English language. We apply this approach to German data which is more challenging than English, since German provides a richer morphology and its word order is more variable than that of English. Our approach takes these requirements into account. We evaluated our approach in several scenarios, which involve of the extraction of relations between geospatial entities (e.g., between cities and their suburbs or between towns and their corresponding rivers). For our evaluation, we developed two different relation extraction systems. One is a DS-based system, which uses the automatically annotated training set and the other one is a standard system, which uses the manually annotated training set. The comparison of the systems showed that both reach the same quality, which is evidence that DS can replace manual annotations. One drawback of current DS approaches is that both structured data and unstructured data must be represented in the same language. However, most knowledge bases are represented in the English language, which prevents the development of DS for other languages. We developed an approach called Crosslingual Distant Supervision (CDS) that eliminates this restriction. Our experiments showed that structured data from a German knowledge base can successfully be transferred by CDS into other languages (English, French, and Chinese).

05 Fakultät Informatik, Elektrotechnik und Informationstechnik

Browse

Filters

Settings

Sort By

Results per page

Search Results