Please use this identifier to cite or link to this item: http://dx.doi.org/10.18419/opus-15145
Authors: Bareiß, Patrick
Title: Just-in-time pruning of large prompted language models
Issue Date: 2024
metadata.ubs.publikation.typ: Abschlussarbeit (Master)
metadata.ubs.publikation.seiten: 99
URI: http://nbn-resolving.de/urn:nbn:de:bsz:93-opus-ds-151649
http://elib.uni-stuttgart.de/handle/11682/15164
http://dx.doi.org/10.18419/opus-15145
Abstract: Prompting of transformer-based autoregressive large language models (LLMs) is a powerful and ergonomic approach to solve language processing tasks (sentiment analysis, question answering, . . . ) using few data. However for each individual task/prompt it is also wasteful: Only a fraction of the generality in the underlying model is ever used. This raises the question: Is this also reflected in the model in the form of irrelevant model parts that do not affect the task-specific accuracy when pruned away (for instance layers)? Previous work does not address this question at a level that preserves the natural affordances of prompting: (1) Low data requirements and (2) ability to reuse the same deployed model for multiple tasks. We propose a new approach that can identify and remove irrelevant model parts if they exist, which does not require additional data (we generate it instead) and removes irrelevant parts only just before a prompt gets passed as input to the model, or just-in-time. After the prompt has been processed we re-add the removed parts to the model. In this way, we can then reuse the model and remove (potentially different) irrelevant model parts for another prompt. We identify a class of model parts for which pruning/re-adding is efficient and therefore allows for efficient just-in-time pruning. During our experiments we find that irrelevant just-in-time prunable model parts do exist for many prompts (for Mistral-7B and GPT-2-XL) and we can remove them to a substantial degree, reducing FLOPs by up to 46% while preserving the accuracy of the original model.
Appears in Collections:05 Fakultät Informatik, Elektrotechnik und Informationstechnik

Files in This Item:
File Description SizeFormat 
Master_thesis_PatrickBareiss.pdf939,06 kBAdobe PDFView/Open


Items in OPUS are protected by copyright, with all rights reserved, unless otherwise indicated.