- creating reams of text
- tidying up formatting
- Translating
With all of the hype of AI and its touted capabilities, look under the hood and there’s likely an LLM somewhere, either stealthily performing a task you didn’t expect, or wholesale pretending to be an AI strategy. But there are areas where it might make sense to use them. When should that be? Is it ok to lean on them sometimes, all the time, or never?

Understanding LLM outputs
Firstly, the more complicated the task, the less likely you will get the answer that you want. This is because the agent which interacts with the LLM for you will be splitting complex tasks into smaller tasks itself. Think about splitting the task you want to achieve into several smaller tasks, where at each stage you can observe the output. This will allow YOU to choose the direction, as opposed to the agent.
Secondly, you must not interpret anything within the response of the LLM as showing more or less trustworthiness. An LLM will confidently state fiction alongside facts without separating the two. Asking for a reasoning on how it got there is futile, as there won’t be an explanation you can trust.
Finally, it’s important to understand what an LLM is actually “good” at. The basis of what an LLM does is creating text. From that basis, it can then interact with other agents through that text to start creating more complex things, such as graphics or numeric outputs. Akin to something like a game of “Whispers”, the further you get away from an LLM’s core capabilities, the less likely you are going to get a trustworthy output.

Trust through familiarity
In classic AI models, as you train the model and observe the output, you can draw more and more confidence from seeing the interim results. Repetition and good outcomes build that trust. But what do you do when you are using a system that you didn’t train? That’s exactly what’s happening with an LLM. Your best way of observing outputs is to perform the action of prompt engineering. As previously discussed, even if you’ve seen a run of good results with using a prompt, the non-deterministic nature of an LLM means that it’s extremely difficult to ever fully trust an output.
Therefore, there is an intrinsic lack of trust which should always be held with using LLMs. You should attempt to use LLMs where they make the most sense, but ensure that safeguards are in the appropriate places. Your choice of when, where and how often to introduce safeguards is the key to trusting an LLM output in a system.
Choosing safeguards
As discussed above, by splitting larger tasks into smaller ones, you have a higher chance of catching the output of an LLM before it goes absolutely haywire. Your choice is then to pick which type of safeguard suits which tasks
Safeguard 1: Human in the loop
Humans can be shown outputs, or samples of outputs, to “ok” them before moving on to the next task. Be aware that whilst this might seem like a good step, too many verification tasks for a human lead to inefficiency and lack of focus
Safeguard 2: Automated
LLMs are very good at helping discern whether sections of text were written by an LLM, so why not reuse that capability? Assign an agent the job of checking for errors from the output of another agent. Using automated checks combined with human in the loop can reduce the burden on humans, whilst ensuring that the creative nature of an LLM hasn’t gone off the rails
Safeguard 3: Evals (Evaluations) and Error Analysis
Evaluations are the AI/LLM equivalent of automated tests for programmers. Your automated test using an LLM fits into this, although metrics and other outputs can form part of your evals. You should give test inputs and outputs, defining expected values and tolerances. Separately, if you take instances where you are seeing LLM output fail those tests, you can perform error analysis on them. This is a manual process to attempt to identify patterns of mistakes and to fix them. This entire process should be iterative in nature, much as programming and bug detecting is.
Pick the right technology for the right task
LLMs are:
great at:
good at:
- summarising large documents, can start to introduce errors or hallucinations when asked to cite sources
- interacting with other agents to start drawing simple graphics, but can introduce very simple errors like spelling mistakes in words in those graphics
terrible at:
- numerical tasks
- complex calculations
We previously wrote about “Are you trying to fit every problem into an LLM shaped hole?” and this holds the most true here.
You can trust LLMs more when they are doing tasks they are good at. When they aren’t, well, why bother? Pick the appropriate technology for the tasks you need done.

In Summary
You can start to trust LLMs in limited circumstances, under the correct conditions. Beyond a heavily supervised set of outputs, either automated or fully manual, you could be trusting a non-deterministic system to provide deterministic outputs. Ensure this doesn’t happen to you. In summary:
Split your higher level tasks into smaller tasks
Choose LLMs for the right tasks
Put in place guardrails, including both automated tests and human in the loop
Run error analysis on patterns of errors you spot
The more complicated the task, the more guardrails you should have in place
The less suited an LLM is to the tasks, the more guardrails you should have in place
Don’t try to solve every problem with an LLM, choose your technology wisely
About the authors
Larry is a lifelong technologist with a strong passion for problem-solving. With over a decade of trading experience and another decade of technical expertise within financial institutions, he has built, grown, and managed highly profitable businesses. Having witnessed both successful and unsuccessful projects, particularly in the banking sector, Larry brings a pragmatic and seasoned perspective to his work. Outside of his professional life, he enjoys Brazilian Jiu-Jitsu, climbing and solving cryptic crosswords.
LinkedIn
Ash is a strategy and operations professional with 14 years of experience in financial services, driven by a deep passion for technology. He has led teams and projects spanning full-scale technology builds to client-facing strategic initiatives. His motivation comes from connecting people, processes, data and ideas to create solutions that deliver real-world impact. Beyond work, Ash enjoys exploring different cultures through food and cocktails and practices yoga regularly.
LinkedIn

