The Feedback Loop Is the Moat

30 Jun 2026 • 14 minute read

Every verification and design team I talk to is building agents right now. The demos are genuinely impressive: point an LLM at some RTL, ask it to write assertions or close coverage or root-cause a failure, and it produces something that looks right. Then the same teams hit the same wall: Can I trust it? Will it give me the same quality answer tomorrow, on a different block, under a slightly different condition? In production, what matters is reliability, consistency, and the quality of the result, and that is exactly where generic LLM-based agents fall apart.

The instinct is to reach for a better prompt or a bigger model. I want to argue that this instinct is wrong. The reliability ceiling of a chip design agent is not set by the model. It is set by the quality of the feedback the agent can get from the EDA tools underneath it. Deep integration with those tools is what separates a demo from a deployable agent, and it is the real, durable moat in this space.

LLMs Are Feedback Engines, Not Oracles

Start with what an LLM actually is in this context. It is not a database of truths about your design. It does not know your FIFO depths, your reset sequence, your reachable states, or which of your tests actually exercises which piece of logic. It is a reasoning engine that operates on the context you give it and the feedback it gets back. Hand it nothing but RTL and a question, and it will produce a confident, plausible, and frequently wrong answer, not because it cannot reason, but because it has nothing to check that reasoning against. It is working from inference, not measurement.

This is why I tell people that building a reliable agent is not a prompting problem; it is a feedback problem. The entire craft is closing the loop: the agent proposes something, a tool gives it real feedback, the agent reasons over that feedback and refines, and you repeat until you converge. The more feedback you give, and the richer it is, the better, the more consistent, and the higher-quality the result. This is the single most important principle in the whole field, and almost everything else follows from it.

But not all feedback is equal, and the distinction is what this whole argument turns on.

The first kind is verdict feedback: Did it pass or fail? Did the test compile, did the assertion prove, what is the coverage number? This is necessary, but it is low-bandwidth. A pass/fail bit or a single percentage tells the agent almost nothing about why, and an agent that only sees verdicts is reduced to guess-and-check. It will eventually stumble onto answers, burning compute and wall-clock time, and it will do so inconsistently.

The second kind is diagnostic and structural feedback: not just whether something failed, but why it failed and what the structure of the problem actually is. Which testbench knob corresponds to which coverage hole? Which logic sits in the cone of influence of the signal you care about? What the FSM looks like and which of its states are reachable. This feedback is high-bandwidth, and it is precisely what deep EDA integration unlocks, and what a shallow LLM-plus-RTL wrapper can never produce.

Figure 1: Reliability is a feedback problem: the agent proposes, the EDA engine returns ground-truth feedback, and the loop repeats until the result converges.

Example 1: The Coverage Closure Agent

Take a coverage closure agent. After a regression, you are left with a set of uncovered items, cover points, bins, and the crosses that are always the hardest to hit. The agent's job is to generate the stimulus, the configuration, or the constraints that close them.

The shallow approach is to feed the LLM the RTL, the list of uncovered items, and maybe the testbench, and ask it to reason about why each item is uncovered and what would hit it. And the LLM genuinely can reason structurally here: it can look at a cover point and infer that hitting it requires some signal high while the design is in a particular state, which in turn requires a particular sequence of transactions. For shallow logic and simple cover points, this works. But the moment you are dealing with deep pipelines, rich cross coverage, or holes gated by rare microarchitectural conditions, the LLM is flying blind. It has no idea what your existing tests actually do, which items are genuinely reachable versus structurally dead, or how your stimulus knobs map onto coverage. So it produces plausible stimulus that frequently does not move the needle, and you find that out the expensive way, by spending regression cycles to confirm the guess was wrong. That is the reliability ceiling, and no amount of prompt engineering raises it, because the missing ingredient is not reasoning. It is data about your design that simply is not in the RTL.

Now wire that same agent into something like Cadence Verisium SimAI. SimAI learns, from your actual regression history, the correlation between testbench components and stimulus configuration on one side and coverage outcomes on the other. Instead of the agent guessing which knob might matter, it can ask a model that has measured it: which tests and which configurations correlate with this cover item, which uncovered items are likely reachable and which are probably dead, and what ranked stimulus is predicted to hit a given target. And because it answers from a learned model rather than a fresh regression, the agent gets that feedback in seconds rather than waiting days to find out. Suddenly, the LLM is not guessing; it is reasoning over ground-truth correlation data and proposing changes grounded in what has actually been observed to drive coverage in your environment. The loop closes. Convergence improves, the agent stops chasing dead items, and, critically, it does so consistently, run after run, block after block.

Figure 2: In coverage closure, the LLM reasons and SimAI supplies measured stimulus-to-coverage correlation learned from regression history. Neither half closes the loop alone.

Notice the division of labor, because it is the whole point. The LLM is the reasoner: it decides what to try, how to compose knobs, and how to phrase a constraint. SimAI is the feedback oracle: it supplies the measured correlation that tells the reasoner whether an idea has any chance of working. Neither is sufficient alone. And SimAI's correlation model is not something you can prompt your way to or scrape off the RTL, it is built on the simulation engine, the coverage database, and the regression data underneath it. You only get it through deep integration. That is not an incidental detail. That is the moat, showing up in the very first example.

Example 2: The Formal Verification Agent

Formal makes the same point even more sharply, because formal is unforgiving in a way simulation is not.

The defining problem in formal is convergence. You write a property, the engine runs, and instead of a clean proof or counterexample, you get a bounded result or an inconclusive one if the state space is too large for the engine to reach a verdict. The agent's job is to make properties converge, and the levers are well known: assumptions and constraints to prune illegal state space, abstractions to shrink complexity, and helper assertions or lemmas to decompose the problem so the engine can finish.

The shallow approach, RTL plus property, asks the LLM for assumptions and helpers, which is not just unreliable here; it is dangerous. Without understanding the design's structure, the LLM proposes generic constraints. It may over-constrain and quietly prune away legal behavior, which produces a "proven" result that is actually vacuous, a false green that is worse than no answer at all, because you will ship on it. It may propose helper assertions that do not lie anywhere along the logical path to the target and so do nothing to help the engine converge. The model does not know your state encodings, your FIFO depths, the cone of influence of the target, or the protocols at your interfaces, and in formal, that ignorance does not produce a wrong number you can double-check. It produces an unsound proof you will trust.

Now give the agent access to Cadence Jasper Formal Verification Platform's design intelligence. Before it proposes anything, it can extract real structure from the design: the FSMs and their reachable states, the FIFOs and memories, and the control logic around them, the cone of influence (COI) of the target signals, the interface protocols, and the structural complexity that is actually driving the blow-up. With that in hand, every lever changes character.

For assumption generation, the agent stops guessing constraints and starts deriving them from the reachable FSM states and the protocol rules at the interfaces, constraints that prune genuinely illegal space without touching legal behavior. And because it knows the cone of influence of the target, it can check that an assumption is not constraining away the very logic under proof, which is exactly the mistake that creates vacuous results.

For abstraction, structure tells the agent what to abstract and what to leave alone. The cone of influence identifies the logic that is irrelevant to the property and can be cut away; FIFO and counter detection identify exactly where data abstraction, counter abstraction, or memory abstraction is both safe and high-leverage. Instead of mindlessly abstracting and destroying precision, the agent targets the specific structures that are exploding the state space.

For helper assertions, knowing the FSM and the cone of influence lets the agent propose intermediate lemmas that actually sit on the proof's logical path, prove that an internal FSM reaches a given state, and the engine can lean on that to converge on the top-level property. These are decomposition steps grounded in the design's structure, not lucky guesses.

Figure 3, Jasper's structural intelligence (FSMs, FIFOs, COI, protocols) grounds all three formal levers, turning guesses into structure-driven, sound transformations that converge.

The pattern is identical to the coverage case. The LLM supplies the reasoning and the creativity, which abstraction, which lemma, how to phrase an assumption. Jasper supplies the structural ground truth that makes that reasoning both correct and sound. Strip Jasper out, and the agent is reasoning in the dark, and in formal, reasoning in the dark does not just slow you down, it produces results you cannot trust. Wire it in, and the proofs converge reliably and, just as importantly, soundly.

The Principle Underneath Both

Step back, and the two examples collapse into one principle. In both, the LLM is the reasoning engine, and the EDA tool is the source of high-bandwidth, ground-truth feedback, correlation data in one case, structural intelligence in the other. And in both, the agent's reliability, its consistency, and the quality of its output are a direct function of the richness of the feedback the tool exposes. An agent that sees only RTL and logs is stuck at verdict-level feedback and will always have a reliability ceiling. There is no prompt that conjures a signal the tool never produced.

Feedback Also Has to Be Fast

There is a second axis to feedback that matters just as much as richness, and teams consistently underestimate it: latency. EDA feedback is often slow. A full regression can run for hours or days. A formal proof can churn for a long time and still come back inconclusive. If every turn of the agent's loop costs a multi-day regression, it is not really a loop; the agent gets to try a handful of things, not the hundreds it would need to converge on a genuinely hard problem.

So for the agent, speed can matter as much as accuracy, and, importantly, the agent usually does not need an exact answer to make progress. A fast, approximate signal that points in the right direction is enough for it to iterate: to rank ideas, prune the obviously bad ones, and concentrate effort where it is likely to pay off. We do not need full accuracy at every step. We need a usable signal, quickly. A good approximation in seconds beats an exact result in days.

This is why I think one of the highest-leverage things we can build is fast prediction models for the expensive functions in the flow, surrogates that approximate what a slow engine would tell you, in a fraction of the time. The clearest example is coverage: a model that predicts the coverage impact of a proposed change in seconds, instead of launching a regression to find out. Give the agent that, and it can explore stimulus and configuration changes rapidly, guided at every step by a fast estimate of what will actually move coverage. The idea generalizes: a model that predicts whether a property is likely to converge, or which abstraction is likely to help, lets a formal agent try directions before paying for the engine.

These surrogates do not replace the golden engines; they feed the agent. The pattern is an inner loop and an outer loop: the agent iterates many times, cheaply, against the fast predictor, and falls back to the exact, slow engine only to confirm the few candidates that survive. SimAI is already an instance of this; it answers "will this help coverage?" from a learned model, fast, without re-running the regression. The opportunity is to build that kind of fast, predictive feedback for more of the flow.

Figure 4, Two loops: the agent iterates rapidly against a fast, approximate prediction model and confirms only the survivors against the slow, exact engine. Fast, approximate feedback is what makes the loop practical.

Why This Is the Moat

This is why I keep coming back to integration as the moat, and I mean that in a hard, structural sense, not as a slogan.

Anyone can wrap a frontier model. The weights are a few API calls away, and they get better for everyone at the same time. If your agent is "LLM plus RTL," you have no durable advantage, you ride the model curve, the team across the street rides it too, and on the day a better model ships, you both get better together. There is nothing to defend.

What is not commoditizing is the other side of the loop. SimAI's coverage correlation is built on a simulation engine, a coverage database, and years of regression data. Jasper's structural intelligence is built on world-class formal engines and decades of analysis technology. And the fast prediction models that keep the agent's loop quick, a coverage predictor, a convergence predictor, are built on those same engines and the same accumulated data. None of this is scrapable, promptable, or reproducible by a model, however large. It is the ground truth of the design, and it lives inside the EDA tools.

So the moat is not the agent and it is not the model. It is the depth of integration into the EDA stack and the richness and speed of the feedback that the stack can expose. And it compounds: better engines and better prediction models produce richer, faster feedback; that feedback produces more reliable, higher-quality agents; more reliable agents drive more usage; more usage produces more data; and more data makes the engines and the predictors better still. An organization that owns both the engines and the agents and wires them together at depth sits on that flywheel. A pure-play wrapper or a generic coding agent pointed at RTL does not, and no amount of model progress changes that, because model progress is the one thing everyone shares.

Figure 5: The integration flywheel: better engines and predictors yield richer, faster feedback, which yields more reliable, higher-quality agents, more usage, and more data, which improves the engines again.

What This Means for EDA Tools

This reframes what EDA tools have to become. They were built, for good historical reasons, around a human in the loop, GUIs, waveform viewers, and reports meant to be read by an engineer at 11pm. The next generation has to be built for an agent in the loop. That means exposing not just verdicts but diagnostics and structure as first-class, machine-readable feedback: coverage correlation, cones of influence, FSM and FIFO extraction, proof state, root-cause signals, the things a human debugs from, handed to an agent as data. It also means exposing feedback that is fast, not just rich: fast prediction models, for coverage impact, for convergence likelihood, that answer in seconds what an engine answers in days, so the agent can iterate against a surrogate and fall back to the golden engine only to confirm. And it means the agents have to be built to consume more and more of it, closing tighter loops as the tools open up deeper and faster signals.

Those two roadmaps are the same roadmap. Tools expose richer feedback; agents close tighter loops around it; the loop gets tighter, and the results get more reliable; and that co-design, not a bigger model, not a cleverer prompt, is the future of chip design tools.

The Bottom Line

The teams that win the agent era in EDA will not be the ones with the cleverest prompt or the biggest model. Everyone gets the same models. The winners will be the ones who close the tightest, fastest feedback loop between the reasoning engine and the ground truth of the design, and get the reliability, consistency, and quality that comes with it. That ground truth, and the fast approximations that make it usable in a loop, live inside the EDA tools. Deep integration is not a feature. It is the moat, and it is the future.

Learn more about Cadence's Design for AI and AI for Design strategy.