a builder's codex
codex · operators · Anthropic · ins_outcomes-grader-agent-evaluation

A separate grader agent in its own context window closes the output verification loop at production scale

By Anthropic · AI safety research company and Claude developer · 2026-05-06 · talk · Code with Claude

Tier A · TL;DR
A separate grader agent in its own context window closes the output verification loop at production scale

Claim

Deploy a separate grader agent in its own context window to evaluate whether output meets a defined success rubric. The grader runs independently of the generator and has no access to the generator's reasoning chain, only the output.

Mechanism

When a grader shares context with the generator, it inherits the generator's blind spots. A separate context window forces independent evaluation against the rubric. The rubric — not the generator — determines pass or fail. This creates a closed feedback loop: the generator produces, the grader measures, the delta drives improvement without scaling manual review linearly with volume.

Conditions

Holds when: the task has a defined, measurable success rubric. Output format is consistent enough for the grader to evaluate. The grader can access necessary ground truth or reference material.

Fails when: success criteria are vague or subjective. The grader and generator share overlapping context that smuggles in confirmation bias. The rubric itself is wrong.

Evidence

Announced at Code with Claude, May 6, 2026, as part of Claude Managed Agents public beta. Internal testing showed +8.4% improvement on docx file generation and +10.1% on pptx file generation after Outcomes was added.

"Agents do their best work when they know what 'good' looks like."

Signals

Counter-evidence

For tasks without a verifiable rubric, adding a grader adds latency and cost with no quality signal. The grader itself can be miscalibrated if the rubric is underspecified.

Cross-references

Open the interactive view → View original source → Markdown source →