LLM “Hunger Games”
Several humorous moments punctuated the inaugural AI STAC conference last week, as participants reflected on the world in which we now find ourselves. My favorite was a remark by Conor Twomey, Head of Customer Success at KX, who said there’s currently a “Hunger Games” for LLM tokens occurring within large financial institutions. (Tokens—roughly, words that are input to or output from a model—are the currency by which the precious resources running large language models are priced and rationed.)
Conor’s remark came during a thought-provoking discussion with Ambika Sukla of nlmatics, Mike Bloom from WEKA, and Victor Alberto Olmedo from Pure Storage on how to architect the software and hardware—that is, the full stack—required for good retrieval-augmented generation (RAG). RAG is a popular technique to make responses from an LLM more current, accurate, and traceable without retraining or fine-tuning the model.
The Hunger Games reference neatly conveys both the eagerness of business leaders to capitalize on AI and the high cost of (some) AI solutions. Expect to see a lot of technical innovation aimed at bringing costs down through clever optimizations.
But even if supply catches up with demand, I won’t soon forget the mental image of business analysts slinging arrows at each other across open-plan floorspaces.
About the STAC Blog
STAC and members of the STAC community post blogs from time to time on issues related to technology selection, development, engineering, and operations in financial services.