Posted January 13, 2026
Guest Article

Stop Building Agent Chains. Start Building Hybrid Loops.

Why enterprises need architectures that think, not agents that drift.

This guest contribution was written by:

If you work with modern AI models, you will soon feel a strange tension. On one hand, you are told to give the model everything in one large prompt. This includes long context, rich instructions, a list of tools, and then you let the model figure it out. On the other hand, the problems you care about are messy. They are long running and tied to systems in the real world. They can involve thousands of tool calls, changing state, partial failures, and many chances to get things quietly wrong.

This essay describes the architecture that resolves this conflict: the hybrid loop. It is written for leaders who know LLMs are powerful, but who now must build durable, real-world systems that think. The core idea is simple:

  • Let the model think deeply and globally at a few key points.

  • Let deterministic code own execution, state, and safety.

  • Let a hybrid agentic architecture guide the whole thing.

Before we examine the hybrid loop, consider the architectures that came before it.

 

From Stepwise Agents to Single Shot Planning

In the first wave of LLM systems, the model was a short-term problem solver. It could not hold a long chain of reasoning. It had limited context. Its planning was weak and its tool use was fragile. So engineers built what they had to: stepwise agents.

The pattern was always the same. The model would:

  1. Look at the current context.

  2. Propose a small next action.

  3. Trigger a tool or fetch some data.

  4. See the result.

  5. Decide what to do next.

Each decision was another call to the model. For anything complex, you quickly had long chains of calls. Every step replayed most of the past context. Every step added a little more noise. And every step was a chance for the model to drift. Frameworks emerged to manage these chains: strict REACT style, Planner-executors, Multi agent, self-refinement, and a raft of legacy orchestration frameworks such as LangGraph, CrewAI and AutoGen to construct and support them. But they all shared the same fundamental flaws:

  • Latency and cost grew with the number of steps.

  • Long chains were brittle and hard to debug.

  • The same queries behaved differently when the context order changed.

  • There was no clean way to define success or set budgets.

Then the models changed. Context windows grew. Tool use became more reliable. Planning improved. The best models started to feel less like chatbots and more like strong planners. They could take a clear goal and lay out a multi-phase approach in one shot.

This created a new temptation: Single Shot prompting: push everything into one big prompt, ask for the full plan and the final answer, and be done with it.

Sometimes that works. If the environment is static and the tools are simple, a single planning and execution pass can be enough. But most business problems are not like that. Markets move. Data changes. APIs fail. The Single Shot method is too rigid for the real world.

The chain of agents is too brittle. The Single Shot is too blind. A third way was needed.

 

The Hybrid Loop: Planning, Supervision, and Verification

A better way has emerged: the Hybrid Loop. It starts from a different division of labor. Instead of asking one model to “be the agent” and do everything, it separates the task into clear roles and underlying manager. This combines the strengths of today’s powerful models with the control of iterative execution.

The hybrid loop is an architecture of five actors, each with a distinct role. Four of them perform the work; a fifth, the Context Manager, directs the flow of information between them.

First, there is a planner model. This is a frontier reasoning LLM used for what it is now very good at: taking a high-level goal, a list of tools, requirements, and constraints, and producing a structured plan. Not a paragraph of text, but a multi-phase object that describes what to do, in what order, and with what success criteria.

Second, there is a supervisor. This is not a model. It is regular code. It holds the current plan and the execution state. It owns tool calls, asynchronous fanout, parallelism, budgets, and state transitions. It knows nothing about language. It just executes and enforces.

Third, there is a verifier. This combines deterministic code and LLMs, but has a narrow mandate: judge, never act. It is fed compact context snippets, like a generated SQL query, a JSON structure, an LLM output, or even execution traces. It produces1 structured labels such as “valid,” “block,” or “unsafe2.” These labels then feed back into the Supervisor.

Fourth, there is the tool layer: your APIs, databases, and other services. That is where the actual work happens. Tool calls are asynchronous, often parallel, and frequently the dominant cost and latency in the system. They are made by a combination of code, and LLM instantiated calls.

Underpinning them all is the Context Manager, the operation’s central nervous system. It sends targeted pulses of information to each actor. Just what they need, exactly when they need it. Nothing more, nothing less.

Once you see the architecture in these terms, the question is no longer “how do I chain agent steps,” but “how do I build clean interactions between these roles.”

The loop is “agentic” because the system as a whole acts like an agent: it turns goals into actions. It is “hybrid” because the intelligence is shared between models and code in a deliberate way.

 

A Day in the Life of a Hybrid Loop

To make this concrete, imagine a standard business task: a user asks for a set of risk reports. This requires joining data across multiple systems, selecting the right tables, generating valid SQL queries, and writing a report.

Here is what happens inside a hybrid loop.

The planner model sees the request, a description of available tools, and a set of constraints. Instead of improvising step by step, it returns a plan object. That plan might say:

  • Phase 1: Discover relevant tables using a catalog search tool.

  • Phase 2: Fetch detailed schema information only for candidate tables.

  • Phase 3: Generate one or more SQL queries that answer the question.

  • Phase 4: Execute the queries and check the results.

  • Phase 5: Write a report and send it to the user.

This plan is not created in a vacuum. The Planner constructs it using a curated set of information provided by the Context Manager, which draws from a Context Catalog of available tools, data sources, and constraints, organized in levels (for more on the architecture of context, read The AI Context Skyscraper)

The resulting plan includes success criteria (e.g., “at least N candidate tables with a date column and a measure column,” hard constraints (“never use tables whose name starts with archive_ or tmp_”), budgets, (maximum tool calls or time per phase), and instructions on how to summarize tool outputs before they’re sent back to a model.

The supervisor picks up this plan and begins execution. During phase 1, it might call a table listing tool ten times, often in parallel, with different search terms. It collects and filters the results according to selection rules in the plan, like ignoring certain tables or requiring a time dimension. All of this is done in code, not by a model.

To generate the query in Phase 3, the supervisor does not send the model a raw data dump. It asks the Context Manager for a surgical strike. The Manager prepares a tailored package of information of the user’s goal and a lean summary of candidate schemas. The model can then reason without distraction.

If the stakes are high, the supervisor can call a verifier model with the SQL and schema summary. The verifier does not try to rewrite anything. It simply labels the query: for example “valid, warn” or “invalid, block, attempts to access unauthorized PII data3.”

If the label is acceptable, the supervisor proceeds. If not, it records the problem and decides whether to ask for a new plan or to stop.

All the while, the supervisor is monitoring whether the plan’s expectations are being met. If the plan is not working, it can prepare a summary of what has happened and send that back to the planner for a patch or a complete replan.

The hybrid loop is not a free-form conversation. It is a state machine that moves through planning, execution, verification, and replanning, with models and tools playing defined roles at each stage.

The Hybrid Agent Loop: Plan. Execute. Verify. Judge. It is guided by injecting the right context at the right time throughout this continuous loop, with replanning, plan modification, escalation, and ultimately, more accurate answers.

When the Plan is Wrong

In a toy example, the first plan might be perfect. In the real world, plans can be wrong. The hybrid architecture assumes this and makes being wrong a first-class concern.

When the plan does not match the world, there are two basic responses.

If the plan is broadly sound but a detail is off, the supervisor can ask the planner for a patch. It sends a small evidence pack, and the planner responds with changes. The supervisor applies those changes and continues.

If the plan’s assumptions are fundamentally broken, the supervisor can ask for a rewrite. This time it compiles a richer summary of the environment. The planner uses that to construct a new plan. The supervisor switches to this new plan and starts again.

There is a final, crucial guardrail for when the system itself loses confidence. The hybrid loop can be designed to know what it doesn’t know. If the planner is stuck, or if a result falls outside a defined boundary of certainty, the system stops and escalates. It can invoke the most reliable tool of all: a human in the loop.

In both cases, the combination of planner, supervisor, and verifier lets the system change course without becoming opaque. Plans, patches, and rewrites are all explicit objects that can be logged, compared, and audited.

User Requests are sent to the Hybrid Loop for planning. The supervisor executes the plan, verifies intermediate results, and determines if the plan is complete; depending on its judgement, the plan is either tweaked, completely replanned, or escalated for human judgement. The loop continues until results are delivered.

Context is a Design Surface

Raw context is noise. The hybrid loop works because it treats context as a design surface, and the Context Manager is the designer. It derives intent, builds the complete context window, and optimizes each section using the most efficient techniques for that section.

The context manager keeps the world view the planner needs that’s rich enough to propose a sensible strategy, but not so cluttered that it gets lost in details: The verifier needs tightly shaped inputs that let it answer specific questions; Patch and replan prompts need the right amount of evidence (too little and the planner will guess. Too much and it will waste cycles rewriting what already works.); Even tool outputs need to be treated carefully.

The quality of the system depends as much on how you use the context manager to shape and compress these contexts as on which model you use. The same model with well-optimized context will often outperform a larger one wrapped in noise.

 

Determinism, Cost, and Speed

Hybrid loops are gaining ground because they hit a sweet spot between predictability and power.

Determinism comes from the fact that the supervisor is real code and the models are constrained. There is no hidden control logic inside long prompts that change without anyone noticing.

Cost is contained because the number of heavy model calls is small. You are not paying for the model to re-read its entire context on every micro-decision.

Speed follows from the fact that the supervisor can aggressively fan out tool calls. There is no need for a single model to alternate between reading, thinking, acting, and reading again.

At the same time, you keep the human advantages of modern LLMs. They still do the hardest part: turning vague goals into workable plans.

The end result is a system that can do serious work, in real environments, at a cost and latency profile that leadership can live with, and with a level of traceability that risk and audit teams can understand.

In summary: Stepwise Agents versus Single Shot Prompts versus Hybrid Agent Loops.

Architecture That Thinks

If you accept this picture, it has a simple consequence for how you think about AI platforms.

Your AI infrastructure should not be a model with prompts; it must be a reasoning architecture. The Context Manager is its heart, pumping precisely the right information to the planner, supervisor, and verifiers. The hybrid loop is an architecture that does not just execute; it reasons.

For technical leaders, the important point is not the label. It is the underlying separation of concerns. If your systems still think in terms of stepwise chats with an “agent,” you will find yourself fighting the tools as models continue to improve. If your systems treat models as planners and judges wrapped in a deterministic execution fabric, you will be able to take advantage of that improvement with far less friction.

Hybrid loops are, in that sense, less a feature and more a sign that your architecture has caught up with what the models can actually do.

Footnotes

1. Using a mixture of deterministic routines and side-chain LLM classifiers.

2. The key to making this work is ensuring the right validator runs for the right tool. For example: invoke an external LLM judge for free-text outputs, a deterministic library for SQL validation, an external tool for PII risks, SMT solvers for formally-defined workflows. Choosing appropriately makes the difference between a dependable workflow and a brittle one.

3. Query labelling is a powerful supervisor task. It can annotate queries with any additional context, like semantic validity or formalized correctness techniques, depending on the nature of the task.

 

About the author

Mark is an established AI leader with more than two decades of experience building high-performance systems and guiding enterprise adoption of advanced technologies. His career spans front-office trading, quantitative research, and large-scale machine learning, with a consistent focus on solving complex operational and decision-making challenges.

He has designed and deployed high-speed trading and portfolio analytics platforms at Deutsche Bank, Citi, and Bank of America, led the introduction of machine learning into predictive analytics and execution at Point72, and served as COO of KX, where he helped drive adoption of AI-enabled data platforms across finance, healthcare, defense, and automotive sectors. He also previously served as a Director at Brainpool AI, a global network of academic researchers advising Fortune 500 companies.

As Chief AI Officer at AI One, Mark applies his background in AI, risk systems, and real-time data to architect scalable, practical solutions that deliver measurable impact. His work focuses on bridging research and execution, ensuring that frontier AI capabilities translate into resilient, production-grade systems for enterprises navigating rapid technological change.

LinkedIn

 

Sign up to
our newsletter