Talk

20 Years of Data: Building finetuning datasets and fixing search along the way

Konrad Handrick | Friday, May 8, 2026 15:10 Uhr | Gurten, Bern: Panorama

Microphone

Description

Most enterprise data is not ready for finetuning. It is scattered across systems, duplicated, inconsistent, and spread across documents, tickets, chats, and other formats. We show how to turn that data problem into an infrastructure problem.
Strong data governance standards gave us a head start, but not a usable corpus. We built pipelines that turn it into unified formats and show how to handle such a corpus; from serving large open-weight models for scoring and synthesis, to training small specialized models that help agents digest and retrieve what’s actually there.

And of course: Once you’ve actually curated your data, onboarding new people into your tools gets a lot easier too. Renewed interest in supposedly outdated or stale data ups our productivity.
We’ll show the pipelines, the plots, and the infrastructure. If you’re digging or about to dig into your vast internal data troves, this talk will save you a lot of engineering hours of wrong turns.