Bitte benutzen Sie diese Kennung, um auf die Ressource zu verweisen:
http://dx.doi.org/10.18419/opus-13825
Langanzeige der Metadaten
DC Element | Wert | Sprache |
---|---|---|
dc.contributor.author | Tessadri, Wolfgang | - |
dc.date.accessioned | 2023-12-14T09:34:28Z | - |
dc.date.available | 2023-12-14T09:34:28Z | - |
dc.date.issued | 2023 | de |
dc.identifier.other | 1876971134 | - |
dc.identifier.uri | http://nbn-resolving.de/urn:nbn:de:bsz:93-opus-ds-138442 | de |
dc.identifier.uri | http://elib.uni-stuttgart.de/handle/11682/13844 | - |
dc.identifier.uri | http://dx.doi.org/10.18419/opus-13825 | - |
dc.description.abstract | With the advent of modern chat applications, an increasing number of German dialect speakers use their dialects for written communication. The DiDi Facebook corpus (Frey et al. 2016) captures this phenomenon for South Tyrolean dialects. While the authors included a dialect/standard variety tag on the posting level, a third of these tags was undefined. By training DeBERTa and XLM-RoBERTa for dialect/standard classification we reduce these undefined instances by over 75%. We also use XLM-RoBERTa to add explicit variety labels to individual tokens. By performing a linear regression analysis of socio-linguistic variables and a label-derived dialectality metric we show that the generated labels are highly meaningful. Finally, we describe how the implemented Transformer models can be applied to gather geo-referenced dialect samples on Twitter and we discuss how this data can enrich future dialectometric research. | en |
dc.language.iso | en | de |
dc.rights | info:eu-repo/semantics/openAccess | de |
dc.subject.ddc | 004 | de |
dc.subject.ddc | 400 | de |
dc.title | Enhancing a German dialect corpus with neural methods | en |
dc.type | masterThesis | de |
ubs.fakultaet | Informatik, Elektrotechnik und Informationstechnik | de |
ubs.institut | Institut für Maschinelle Sprachverarbeitung | de |
ubs.publikation.seiten | 128 | de |
ubs.publikation.typ | Abschlussarbeit (Master) | de |
Enthalten in den Sammlungen: | 05 Fakultät Informatik, Elektrotechnik und Informationstechnik |
Dateien zu dieser Ressource:
Datei | Beschreibung | Größe | Format | |
---|---|---|---|---|
MA_thesis_Tessadri.pdf | 1,96 MB | Adobe PDF | Öffnen/Anzeigen |
Alle Ressourcen in diesem Repositorium sind urheberrechtlich geschützt.