Enhancing a German dialect corpus with neural methods

Bitte benutzen Sie diese Kennung, um auf die Ressource zu verweisen: http://dx.doi.org/10.18419/opus-13825

Autor(en):	Tessadri, Wolfgang
Titel:	Enhancing a German dialect corpus with neural methods
Erscheinungsdatum:	2023
Dokumentart:	Abschlussarbeit (Master)
Seiten:	128
URI:	http://nbn-resolving.de/urn:nbn:de:bsz:93-opus-ds-138442 http://elib.uni-stuttgart.de/handle/11682/13844 http://dx.doi.org/10.18419/opus-13825
Zusammenfassung:	With the advent of modern chat applications, an increasing number of German dialect speakers use their dialects for written communication. The DiDi Facebook corpus (Frey et al. 2016) captures this phenomenon for South Tyrolean dialects. While the authors included a dialect/standard variety tag on the posting level, a third of these tags was undefined. By training DeBERTa and XLM-RoBERTa for dialect/standard classification we reduce these undefined instances by over 75%. We also use XLM-RoBERTa to add explicit variety labels to individual tokens. By performing a linear regression analysis of socio-linguistic variables and a label-derived dialectality metric we show that the generated labels are highly meaningful. Finally, we describe how the implemented Transformer models can be applied to gather geo-referenced dialect samples on Twitter and we discuss how this data can enrich future dialectometric research.
Enthalten in den Sammlungen:	05 Fakultät Informatik, Elektrotechnik und Informationstechnik

Dateien zu dieser Ressource:

Datei	Beschreibung	Größe	Format
MA_thesis_Tessadri.pdf		1,96 MB	Adobe PDF	Öffnen/Anzeigen

Alle Ressourcen in diesem Repositorium sind urheberrechtlich geschützt.

Universität Stuttgart