Please use this identifier to cite or link to this item:
http://dx.doi.org/10.18419/opus-15145
Authors: | Bareiß, Patrick |
Title: | Just-in-time pruning of large prompted language models |
Issue Date: | 2024 |
metadata.ubs.publikation.typ: | Abschlussarbeit (Master) |
metadata.ubs.publikation.seiten: | 99 |
URI: | http://nbn-resolving.de/urn:nbn:de:bsz:93-opus-ds-151649 http://elib.uni-stuttgart.de/handle/11682/15164 http://dx.doi.org/10.18419/opus-15145 |
Abstract: | Prompting of transformer-based autoregressive large language models (LLMs) is a powerful and ergonomic approach to solve language processing tasks (sentiment analysis, question answering, . . . ) using few data. However for each individual task/prompt it is also wasteful: Only a fraction of the generality in the underlying model is ever used. This raises the question: Is this also reflected in the model in the form of irrelevant model parts that do not affect the task-specific accuracy when pruned away (for instance layers)? Previous work does not address this question at a level that preserves the natural affordances of prompting: (1) Low data requirements and (2) ability to reuse the same deployed model for multiple tasks. We propose a new approach that can identify and remove irrelevant model parts if they exist, which does not require additional data (we generate it instead) and removes irrelevant parts only just before a prompt gets passed as input to the model, or just-in-time. After the prompt has been processed we re-add the removed parts to the model. In this way, we can then reuse the model and remove (potentially different) irrelevant model parts for another prompt. We identify a class of model parts for which pruning/re-adding is efficient and therefore allows for efficient just-in-time pruning. During our experiments we find that irrelevant just-in-time prunable model parts do exist for many prompts (for Mistral-7B and GPT-2-XL) and we can remove them to a substantial degree, reducing FLOPs by up to 46% while preserving the accuracy of the original model. |
Appears in Collections: | 05 Fakultät Informatik, Elektrotechnik und Informationstechnik |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
Master_thesis_PatrickBareiss.pdf | 939,06 kB | Adobe PDF | View/Open |
Items in OPUS are protected by copyright, with all rights reserved, unless otherwise indicated.