Keys, Queries, and Values: The Celestial Mechanics of Attention
Luis Serrano | Friday, April 25, 2025 | Gurten Pavillon
Description
The attention mechanism is the secret sauce behind the success of transformer models like ChatGPT and DeepSeek. It enables these models to dynamically focus on the most relevant parts of a text, determining which words or phrases are most important based on their contextual relationships.
In this talk, we’ll explore language models through a geometric lens, imagining words as celestial bodies floating in space. The attention mechanism acts as a gravitational force, pulling these words together to form "context galaxies" where meaning emerges. We’ll also dive into the roles of the Key, Query, and Value matrices, which serve as the cosmic tools that help extract and organize information from the text.
No advanced mathematical background is required—just a willingness to think creatively about addition, subtraction, and the occasional multiplication. Join us for a journey through the universe of transformers, where words, gravity, and context collide!