Claim
When Claude processes multiple interview transcripts in a single batch prompt, fabrication rates increase significantly. Reliable extraction requires a dedicated skill run per transcript at roughly three minutes per file, not a bulk-processing approach.
Mechanism
Batch prompts averaging across multiple transcripts lose the specific context and constraints that keep outputs grounded. The model fills inter-transcript gaps with plausible-sounding inference. Running a dedicated skill per transcript provides tighter constraints and a smaller context window, reducing the inferential surface where fabrication occurs. The per-transcript model also makes spot-checking tractable: verify one file and you know the method holds across the batch.
Conditions
Holds when: transcripts contain heterogeneous content from different customers, contexts, or time periods.
Fails when: transcripts are highly structured and templated (e.g., support tickets with a fixed schema), reducing inferential ambiguity. Three minutes per transcript at scale may not be operationally feasible for programs with 100+ interviews.
Evidence
Pierri found that batch-processing customer transcripts with Claude produced fabrications during a live research workflow. Reliable output required a dedicated per-transcript skill at roughly three minutes per file.
"AI needs A LOT of help to do a good job (way more than the average person realizes)." — Anthony Pierri, LinkedIn, May 2026
Signals
- Quotes attributed to specific customers in batch output cannot be verified in source transcripts
- Per-transcript skill runs produce extractions that survive spot-check verification
Counter-evidence
Batch processing with structured output formats and explicit grounding instructions can reduce fabrication rates. Three minutes per transcript may not scale for large research programs. Smaller context windows in cheaper models may narrow the gap.