AI and other ramblings

Close the Loop: When LLMs Work and When You Need More

There's a simple question that tells you whether you can trust an LLM for a task:

Can you close the loop?

If you can verify the output — through tests, experiments, iteration, or just looking at it — you're fine. Use the model. Ship fast. Iterate.

If you can't, you're in dangerous territory. That's when you need to think harder about what the model actually knows versus what it's pattern-matching.

This isn't a new insight. It connects to Pearl's causal hierarchy, the Lucas Critique in economics, and recent work on performative prediction in ML. The contribution here is organizing it around a simple question practitioners can ask.


The Loop Hierarchy

Not all feedback loops are created equal. Here's a rough taxonomy:

Tier 1: Tight loop, fast feedback

Use LLMs freely.

The pattern: errors are cheap and caught quickly. The LLM can be wrong 30% of the time and still be massively useful because you're catching and fixing those errors in real-time.

This is where most LLM productivity gains live today. And it's why "vibe coding" works — not because the model is always right, but because you can tell when it's wrong.

Tier 2: Loop exists, but slower and noisier

Use LLMs with monitoring.

The pattern: feedback exists but it's noisy, delayed, or partially observable. You can learn and improve, but you need to be careful about what you're learning. Easy to optimize for the metric while missing the point.

This is Goodhart's Law territory: "When a measure becomes a target, it ceases to be a good measure." Your feedback loop exists, but it might be teaching you the wrong lessons.

Tier 3: Loop exists, but confounded

You need causal reasoning to interpret your feedback.

The pattern: you have feedback, but it will actively mislead you if you take it at face value. The loop is lying to you because correlation isn't causation, and the world is full of confounders.

This is where you need the econometrics toolbox — or at least awareness that your intuitive read of the data might be backwards. Perdomo et al.'s work on performative prediction formalizes exactly this problem: when your predictions influence the outcomes you're measuring, standard ML assumptions break down.

Tier 4: No loop at all

You need causal reasoning to make the decision.

The pattern: there is no feedback loop. The counterfactual is unobservable. You have to reason about what would happen under different choices, because you can't measure it.

This is the hardest case, and it's where most high-stakes decisions live. It's also the top of Judea Pearl's "ladder of causation" — the level of counterfactual reasoning that requires imagining worlds that never existed.


Where Causality Fits In

Here's the insight: causality is what you need when you can't close the loop cleanly.

If you can run a proper randomized experiment, you're closing the loop on causality directly. The experiment is the causal reasoning — you don't need to think about confounders because randomization handles them.

The entire field of causal inference exists for situations where you can't run the experiment:

These are all ways of extracting causal signal when the loop is confounded or missing.

Robert Lucas made this point in economics fifty years ago: models trained on historical correlations fail when policy changes the underlying system. The correlations you learned were equilibrium outcomes, not laws of nature. Change the incentives, and the equilibrium shifts.


The Practical Upshot

When someone asks "can I use an LLM for this?", the real question is:

How tight is my feedback loop, and how clean is the signal?

Loop quality What to do
Tight and fast Ship. Iterate. Let the LLM cook.
Slow but clean Use with monitoring. Build evaluation.
Confounded You need causal reasoning to learn correctly.
Missing You need causal reasoning to decide at all.

Most LLM wins today are in the top row. That's fine — there's a lot of value there.

The danger is when people drift down the table without noticing. When they take LLM outputs that feel authoritative and use them for tier 3 or tier 4 decisions. When the loop is broken but the confidence is high.


The LLM's Role Changes By Tier

This isn't about "LLMs good" or "LLMs bad." It's about what role they should play:

Tier 1: LLM as doer. It writes the code, drafts the text, generates the options. You verify and ship.

Tier 2: LLM as assistant with monitoring. It handles the task, but you're watching metrics and catching drift.

Tier 3: LLM as analyst, not oracle. It can help you think about confounders, generate hypotheses, explain results. But the causal conclusion comes from proper methods, not from asking the model "what caused X?"

Tier 4: LLM as thought partner. It helps you structure the decision, surface considerations, stress-test assumptions. But the reasoning is yours, informed by whatever causal evidence you can gather.

The mistake is using a tier-1 workflow for a tier-4 problem. That's when people get burned.


The Bottom Line

The question isn't "is this LLM smart enough?"

The question is: "Can I close the loop?"

If yes: ship fast, iterate, trust the process.

If no: slow down. Figure out what kind of feedback you have, how confounded it is, and what you need to reason correctly about cause and effect.

The models keep getting better. But the loop problem doesn't go away. No matter how good GPT-N gets, you still can't observe counterfactuals, and you still have to think about what would have happened when you're making decisions that change the world.

That's not a limitation of AI. It's a feature of reality.


This post is a practical coda to my series on transformers and causal inference. The short version: LLMs can learn some causal structure, but it's fragile, and for high-stakes decisions you need more than pattern matching. This post is about knowing when you're in that territory.


References:

Tags: AI, LLMs, Causality, Decision Making, Feedback Loops, Machine Learning