Arlind Nocaj und Simon Peyer |Friday, April 25, 2025 | Gurten Pavillon
Description
Fine-tuning enormous language models is prohibitively expensive in terms of the hardware required and the storage and switching cost for hosting independent instances for different tasks.
This Talk will explore LoRA (Low-Rank Adaptation), an efficient adaptation strategy that neither introduces inference latency nor reduces input sequence length while retaining high model quality. Importantly, it allows for quick task switching when deployed as a service by sharing the vast majority of the model parameters.