Talk

Multi Adapter Hosting LoRA Adapter

Arlind Nocaj und Simon Peyer |Friday, April 25, 2025 | Gurten Pavillon

Description

Fine-tuning enormous language models is prohibitively expensive in terms of the hardware required and the storage and switching cost for hosting independent instances for different tasks.

This Talk will explore LoRA (Low-Rank Adaptation), an efficient adaptation strategy that neither introduces inference latency nor reduces input sequence length while retaining high model quality. Importantly, it allows for quick task switching when deployed as a service by sharing the vast majority of the model parameters.

Speaker

Joshua Starmer
Simon Peyer
Solution Architect at AWS
Joshua Starmer
Arlind Nocaj
Senior Solutions Architect at AWS