← back to home

The prompt-action gap: why your agent does the right thing for the wrong reason

2026.03.27·agentsalignmentengineering·3 min read

Every agent framework has the same blind spot. Your agent completes the task — the email gets sent, the data gets updated, the report gets generated. You look at the output and think: great, it works.

But if you look at the reasoning trace — the chain of thought between prompt and action — you'll often find the agent arrived at the right answer for entirely the wrong reason.

The Gap

I call this the prompt-action gap. It's the distance between what your prompt intended and what the agent actually reasoned about before taking action.

Here's a simple example. You prompt an agent to "find the highest-priority support ticket and draft a response." The agent:

  1. Queries the ticket system
  2. Sorts by creation date (not priority)
  3. Picks the newest ticket
  4. Writes a reasonable response

The output looks correct because the newest ticket happened to also be the highest priority. But the agent's reasoning was wrong — it sorted by date, not priority. Next time, when those don't align, it'll respond to the wrong ticket.

Why It Matters at Scale

When you're running one agent, you can spot-check. When you're running twenty, you can't. The prompt-action gap is invisible in outcomes until it isn't.

At ButterGrow, we've found three categories of gap:

Semantic gaps — the agent interprets a term differently than you intended. "High-value leads" means different things to your sales team and to an LLM trained on general text.

Ordering gaps — the agent does the right steps in the wrong order, which happens to work in most cases but fails in edge cases (like our priority/date example).

Scope gaps — the agent goes beyond or falls short of the intended action boundary. It updates the CRM and sends a Slack notification when you only asked for the CRM update.

How to Close It

You can't fully close the prompt-action gap. But you can measure it:

  1. Log reasoning traces, not just outcomes
  2. Compare intent to execution at the step level
  3. Test with adversarial cases where the "right answer for wrong reason" would fail

This is fundamentally what we built AgentScore to do. But even without a dedicated tool, you can start by adding assertions to your agent loops:

// After each agent step, verify the reasoning
assert(
  agent.lastReasoning.includes("priority"),
  "Agent should reason about priority, not just recency"
);

It's crude, but it catches the most obvious gaps.


The prompt-action gap is the agent equivalent of a test that passes for the wrong reason. You wouldn't ship code with coincidentally passing tests. Don't ship agents with coincidentally correct behavior.