Exploring the effects of enriched English language input on language model efficiency

Zeller, Tom

Exploring the effects of enriched English language input on language model efficiency

Files

MA-Zeller.pdf (1.52 MB)

Date

2024

Authors

Zeller, Tom

Abstract

Recent years have seen the advent of large-scale language modeling as exemplified by transformer-based models like GPT or variants of the BERT architecture. These models, which are trained on massive datasets and using compute unattainable by actors that are not of the scale of the biggest tech companies, have shown impressive feats of syntactic and semantic understanding. Naturally, interest has risen in making these models more efficient, in terms of compute as well as data requirements. Research in this area can be seen as primarily motivated by two factors: reducing the barrier for smaller actors like research institutes or end consumers to train and execute state-of-the-art models, as well as reducing the carbon footprint of these models. To achieve this goal, model compression techniques like quantization, pruning or distillation are utilized. This work aims to explore a different, less model-centric and more data-centric approach: Modifying the training and inference data, by enriching it with syntactic and semantic information. To this end, a lexical resource is created which maps English words to a form where individual characters represent values of a range of semantic and syntactic features, providing lexical information that is accessible to all model types that operate on tokens at the sub-word or character-level. Different features and methods of representation are discussed, and their effect on model performance is evaluated by pretraining a small GPT-family model and fine-tuning on downstream tasks of the SuperGLUE benchmark. Given a fixed amount of data and compute, the experiments show a performance advantage for a character-level model trained using the enriched data.

URI

http://nbn-resolving.de/urn:nbn:de:bsz:93-opus-ds-144450
http://elib.uni-stuttgart.de/handle/11682/14445
http://dx.doi.org/10.18419/opus-14426

Collections

05 Fakultät Informatik, Elektrotechnik und Informationstechnik

Full item page

Exploring the effects of enriched English language input on language model efficiency

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By