Bitte benutzen Sie diese Kennung, um auf die Ressource zu verweisen:
http://dx.doi.org/10.18419/opus-14231
Autor(en): | Wu, Nianheng |
Titel: | Multimodal OCR post-correction on German historical documents |
Erscheinungsdatum: | 2023 |
Dokumentart: | Abschlussarbeit (Master) |
Seiten: | 43 |
URI: | http://nbn-resolving.de/urn:nbn:de:bsz:93-opus-ds-142500 http://elib.uni-stuttgart.de/handle/11682/14250 http://dx.doi.org/10.18419/opus-14231 |
Zusammenfassung: | Optical Character Recognition (OCR) post-correction is essential to digitalizing historical documents, increasing transcription accuracy, and reducing manual effort. Previous works often handle this as a text-to-text translation problem. However, the orthography of many languages, including German, has evolved across centuries, leading to many "irregular" spellings. Thus, a text-only system would face many uncertainties. Therefore, combining image features with text should be meaningful. The rise of large-scale pretrained models has brought new opportunities in this field. In this work, I will: 1) Introduce a dataset that includes historical German documents from 1783 to 1903 based on Deutsches Textarchiv with aligned golden transcription, OCR-ed textline, and their corresponding textline image; 2) Present a multimodal OCR post-correction system that combines CLIP image encoder, a pretrained image feature model, with ByT5, a byte-based language model. According to my experiments, this model outperforms the state-of-the-art text-only model. |
Enthalten in den Sammlungen: | 05 Fakultät Informatik, Elektrotechnik und Informationstechnik |
Dateien zu dieser Ressource:
Datei | Beschreibung | Größe | Format | |
---|---|---|---|---|
NianhengWu_Masterarbeit.pdf | 966,34 kB | Adobe PDF | Öffnen/Anzeigen |
Alle Ressourcen in diesem Repositorium sind urheberrechtlich geschützt.