Multi Adapter Hosting LoRA Adapter

Talk

Multi Adapter Hosting LoRA Adapter

Simon Peyer und Luca Perrozzi |Friday, April 25, 2025 | Gurten Pavillon

Description

Fine-tuning enormous language models is prohibitively expensive in terms of the hardware required and the storage and switching cost for hosting independent instances for diﬀerent tasks.

This Talk will explore LoRA (Low-Rank Adaptation), an eﬃcient adaptation strategy that neither introduces inference latency nor reduces input sequence length while retaining high model quality. Importantly, it allows for quick task switching when deployed as a service by sharing the vast majority of the model parameters.

Speaker

Simon Peyer

Solutions Architect at AWS

Luca Perrozzi

Solutions Architect at AWS