When to use

A positioning claim or value-prop line before you open a positioning-update review.
A landing-page hero variant before you launch an experiment or A/B test on it.
Ad copy or a campaign headline before it goes to a paid channel.
A cold-outbound opener or a cancellation-save framing before it goes to a real inbox.
Two or more competing variants where you want a ranked read before committing test budget.

When NOT to use

Pricing variants. A synthetic panel cannot predict willingness-to-pay or elasticity.
Novel product behavior with no analog in the personas' world. The model fills the gap with invention. Running AI on multiple inputs simultaneously without structured validation checkpoints produces fabricated output, not analysis errors
Anything where measured behavior is cheap and fast to get directly. A live five-second test or a real first-meeting call beats a synthetic one when you can run it this week. Test positioning in a live sales pitch, marketing stories are unfalsified theory until then

How to use

Scoring rubric · Wynter + Gartner, 7 dimensions

The Synthetic Audience Test

Threshold layer

Clarity · gate dimensionif clarity < 3, stop scoring; the buyer never reaches the other dimensions

Relevancedoes this speak to a real situation this buyer is actually in?

Directional only. Use before filing a gate packet on any positioning or LP variant. Never a gate verdict on its own.

Value layer

Valuedoes the buyer believe this outcome is worth pursuing?

Differentiationdoes this feel distinct from alternatives the buyer already knows?

Credibilitydoes the claim feel plausible given the source and the proof provided?

Intent layer

Urgencydoes this feel like a now problem, not a someday problem?

Friction (inverse)lower is better; every unnecessary word, qualification, or caveat is friction

Trusted-5 panel: sales and CS roles closest to buyer voice. Not internal marketers.

01 Build the panel from verbatim customer language, not from internal assumptions.

The single point of failure in synthetic testing is the persona. If the persona is built from your team's idea of the buyer, the panel is an echo chamber. It tells you your message lands with people who already think like you. The whole value of an external test evaporates.

Build each persona from voice-of-customer evidence: JTBD switch-interview transcripts, G2 and Reddit threads, support tickets, won and lost call notes. JTBD interviews are the strongest single source because they surface the buyer's actual vocabulary, the words they use before they ever heard your category name. JTBD interviews surface the customer's actual language and the switch trigger

"JTBD interviews help you define a customer's language — what they actually mean by 'easy' — the root cause of why they switched, and how their story connects, all of which directly impact your product marketing."

· Bob Moesta, Intercom podcast, 2020 · “JTBD interviews surface the…”

When you pull buyer phrases into the persona, keep them verbatim. Do not let an AI summarizer smooth them into clean themes. The exact phrase is the recognition trigger. The clean theme is just a description, and a description does not surface a sameness problem. AI synthesis smooths over verbatim buyer phrases: mine Reddit and review sites directly, mirror the words, do not paraphrase

"Resonance comes from mirroring, not paraphrase."

· Nicolas Cole, Build and Launch a VOC Landing Page, 2026-05-07 · “AI synthesis smooths over…”

A practical guard when you use AI to lift those phrases: "verbatim" is too vague an instruction. Lock the constraints. Copy one uninterrupted customer turn, no merging, no cleanup, preserve the hedges, and label the source. Otherwise the model stitches sentences and trims hedges, and the persona looks sourced while being partly fabricated. AI quotes are only as good as the instruction that produced them.

02 Pick the panel to be buyers, and pick the trusted reviewers to be closest to the buyer voice.

The synthetic audience is buyers, not marketers. The whole point is to hear the message the way someone who watched five competitor demos this morning hears it, not the way the team that wrote it hears it. Internal reviewers share your vocabulary, so they cannot detect sameness. Only an external read can.

When a human reviews the synthetic output, keep that reviewer close to the real buyer voice too. Sales and customer-success roles sit on calls every week and hear the language of confusion in real time. They are a better trusted-reviewer pool than the internal marketing team. Frontline customer contact is the PMM substrate

Default the panel to all personas. Narrow it only when the variant is persona-specific (a vertical landing page, a role-targeted opener). A message that wins one persona and confuses another is a finding, not a failure, and you only see it if both are on the panel.

03 Anchor the variant against what the buyer already reads or hears.

A message tested in isolation scores against nothing. The buyer is always comparing: against the competitor claim they read yesterday, against the homepage that is live now, against doing nothing. Give the panel that anchor.

For a positioning or LP test, the anchor is the current live copy plus the nearest competitor claim. The panel's job is not "is this good" but "is this clearer, more different, more worth acting on than what the buyer already has in front of them." Without an anchor, you measure fluency. With one, you measure contrast, which is the thing that actually moves a buyer.

Include the status quo as an implicit anchor on every test. When 40 to 60% of buyers who pass on a category make no decision at all, the message is competing with inertia more than with a named rival. A variant that beats the competitor claim but never makes the cost of doing nothing feel real will still lose the deals that go to no-decision. Frame the variant so the panel can react to "why act now," not only "why you." No-decision-as-competitor vs. battle-card-driven competitive workflows

04 Score on the seven dimensions, and stop the moment clarity fails.

Score each variant 1 to 5 per dimension per persona. The rubric is the union of the Wynter message layers and the Gartner B2B message-testing criteria. There is one hard rule: if clarity scores below 3, do not score the later dimensions. The buyer never reaches them. A message the buyer does not understand cannot be differentiated, credible, or urgent, so a score on those is noise.

Dimension	The buyer's read	Direction
Clarity	"I get what this is."	Higher is better. Gate: below 3, stop.
Relevance	"This is for someone like me."	Higher is better.
Value	"I want the result it promises."	Higher is better.
Differentiation	"I see how this is not the same as the others."	Higher is better.
Credibility	"I believe the claim. It reads as evidence, not sales."	Higher is better.
Urgency	"I should act now, not later."	Higher is better.
Friction	"Here is my doubt or objection."	Inverse. Lower is better.

Differentiation is the dimension most messages quietly fail. Buyers default to seeing comparable features across competing products. That perceived sameness is the binding constraint, and an internal review cannot catch it because internal reviewers already know why you are different. Differentiation requires three checks, different, better, matters

"Buyer skepticism and the perception of 'sameness' will limit differentiation efforts if product marketers do not test their differentiators with external audiences." (p.1)

· Gartner, Market Guide for B2B Message Testing Solutions, 2025-07-07 · “Buyers see "sameness", test…”

The synthetic panel is the cheap, early version of that external test. It is not a substitute for it. Sameness is the default in B2B, and features get copied inside months, so the message is where the contest is actually won. In B2B, sameness is the default, you cannot win on features competitors will copy in 6 months

"Sameness is the default in B2B; you cannot win on features alone because competitors will copy them, so the companies that win compete on messaging clarity, differentiation, and brand."

· Peep Laja, sameness is the default; compete on messaging, 2026-03-03 · “In B2B, sameness is…”

05 Capture the one-read takeaway and the objections verbatim. They matter more than the numbers.

For each persona, record the single sentence that persona would say the message means, in its own words. This one-read takeaway is the highest-signal output of the test. If the buyer's restatement does not match what you intended, the message has a comprehension gap no score will show you.

Then list the objections verbatim and the comprehension gaps as plain bullets. A score of 2 on credibility is a flag. The objection "this sounds like every other tool I looked at" is a fix. Treat the verbatim text as the deliverable and the numbers as the index into it.

For a homepage hero specifically, score the read against the five-second trinity: did the persona name the use case, the alternative it replaces, and the result, from the hero alone? If any of the three is missing from the one-read takeaway, the hero is leading with vision, not capability, and it will under-convert. B2B homepages must communicate use case, alternative, and result in five seconds

06 Read the emotional layer on high-stakes variants, not only the rational one.

The rubric scores the rational read. On a large-contract message, that is half the picture. At high ACV the buyer is computing personal-career survival math under the rational language. The message that addresses "will this make me the hero or the person who got it wrong in the all-hands" wins on signal the panel's rational score alone will miss. B2B buying is more emotional than the rational-buyer myth says, large-contract decisions carry personal-career stakes

"If I go sign a contract for a million dollars with a company that puts cake all over my face in front of my boss, my job is gone. That is real emotion."

· Mark Storin, on Dave Gerhardt's Exit Five Ep. 349, 2026-04-23 · “B2B buying is more…”

When a variant targets a high-ACV buyer, add an explicit prompt to the panel: what does this person have to defend internally, and does the message give them cover? At small contract sizes the rational layer is the whole story and this step adds noise. Calibrate the emotional read to the deal size.

07 For an outbound opener, score relevance against the persona, trigger, and value, not against polish.

A cold opener does not fail on grammar. It fails because the prospect cannot tell why they, specifically, got it. Score the opener on whether it names the persona's situation, ties to a why-now trigger, and offers a specific outcome. Those three are the relevance test. Persona + Trigger + Value = Relevant Message. Personalize ~25%, no more.

"Persona + Trigger + Value = Relevant Message."

· John Barrows, Messaging Equation and personalization tiers, 2026-03-03 · “Persona + Trigger +…”

Over-personalization is its own failure mode and the panel will surface it. A message that burns the whole opening on a researched detail but never lands the value reads as effort without relevance. The signal you want is a specific hook into a productized value line, not a custom paragraph the prospect cannot connect to themselves.

08 Record the predicted winner before you see the live result, then calibrate.

Before any live data lands, write down the panel's predicted winner and a confidence number. Lock it. This is what makes the synthetic test a learning loop instead of a one-off read. When the live result comes in, score the panel's prediction against the actual winner. Track that per-persona over many bets.

Personas that keep predicting the wrong winner get retired or rebuilt from fresher voice-of-customer data. Personas that calibrate well earn more trust on the next variant. The panel is only as good as its track record, and you do not know the track record until you have logged forecasts against outcomes. This is the verification loop that separates a system that learns from one that just generates reads. Verification, not execution, is the irreplaceable human job

09 Keep a human checkpoint on every read. The model fabricates when the input is thin.

The fastest way to turn this play into garbage is to run the panel autonomously across thin or contradictory inputs and ship the output. Language models fill gaps. When a persona is underspecified or a variant has no analog in the persona's world, the model generates a plausible objection that no real buyer would raise. Running AI on multiple inputs simultaneously without structured validation checkpoints produces fabricated output, not analysis errors

"AI needs A LOT of help to do a good job (way more than the average person realizes)."

· Anthony Pierri, AI needs A LOT of help to do a good job, 2026-05-10 · “Running AI on multiple…”

The checkpoint is editorial, the same one Cole names for assembled copy: does the panel output sound like a real buyer or like an AI approximation of one? Anything that reads as approximation fails the test before it informs a decision. Customer language is the source material for conversion copy; the LLM is the assembly layer, not the author

"The world's best conversion copywriters don't write copy. They steal it."

· Nicolas Cole, Build and Launch a VOC Landing Page, 2026-05-07 · “Customer language is the…”

The squeeze underneath all of this: AI made the production of variants and reads nearly free, but it did not make the judgment cheaper. Which persona to trust, which objection matters, which read to act on, those calls did not compress. An operator who keeps generating more reads without sharpening that judgment gets more output and worse decisions. Building costs collapsed; judgement didn't, the squeeze is on positioning, not production

"Judgment is the part that doesn't compress."

· Kevin Indig, Growth Memo, 2026-05-04 · “Building costs collapsed; judgement…”

10 Hand the read to the next step. It goes in the packet. It is never the verdict.

The synthetic read is an input to the live test, not a replacement for it. Run it after voice and claim checks pass, so you are testing a message that is on-brand and truthful, not a draft. Run it after the CRO review on landing-page variants. Then the read goes into the experiment packet or the positioning brief as evidence.

Mark it clearly as directional. Downstream readers must not tag a synthetic read as a gate decision. The pattern that makes this safe: the substrate runs the read, the human runs the call. When you connect the panel to live analytics, one AI session can pull the funnel drop-off and synthesize why the message lost, which closes the loop between the synthetic prediction and the measured result. Data shows what; the why lives in qualitative context. Merge them in one AI conversation.

"Data shows what is happening, never why — the why lives in qualitative context, and AI tools that merge both are transformative."

· Else van der Berg, Synthesizing qual/quant and strategy with Claude Code + PostHog MCP · “Data shows what; the…”

Check your work

Every persona traces to verbatim voice-of-customer evidence, not internal assumption.
Buyer phrases in the persona are mirrored, not paraphrased into clean themes.
The panel is buyers; the trusted human reviewers are sales or CS, not internal marketers.
The variant has an anchor: current live copy, a competitor claim, or the status quo.
Clarity below 3 stops the score for that persona. Later dimensions are not filled in.
The one-read takeaway and objections are captured verbatim, not summarized into a number.
High-ACV variants got an explicit emotional-stakes read, not just the rational score.
A predicted winner and confidence were recorded before any live result.
A human checkpoint passed on the output before it informed any decision.
The read is labeled directional and lives inside the packet, not as the gate verdict.

What goes wrong

Reading the panel as a verdict. A 1-to-5 score is not permission to ship. The panel ranks direction; the gate is a separate human call on live evidence. Verification, not execution, is the irreplaceable human job
Running synthetic when live is cheap. If a five-second test or a first-meeting call is available this week, run that. Live behavior beats a predicted read every time. Test positioning in a live sales pitch, marketing stories are unfalsified theory until then
Personas from internal assumption. Built from the team's idea of the buyer, the panel becomes an echo chamber and drifts from the real buyer. Sell to the buyer's mindset, not to product features
Marketers as the trusted reviewers. Internal reviewers share the vendor's vocabulary, so they cannot hear sameness. Use sales and CS, who sit closest to the buyer voice. Frontline customer contact is the PMM substrate
Skipping the clarity cascade. Scoring differentiation and urgency when clarity already failed. The buyer never reached those dimensions, so the scores are invented. Differentiation requires three checks, different, better, matters
Panel size too low. Synthetic panels rank-invert on narrow high-intent product pages at low size. Treat low-N reads as weak signal and weight live calibration harder.
Testing against nothing. No anchor means you measure fluency, not contrast. Buyers always compare; give the panel the comparison.
Status-quo blindness. A message that beats the competitor but never makes inaction feel costly loses the no-decision deals, which are most of them. No-decision-as-competitor vs. battle-card-driven competitive workflows
Autonomous AI on thin inputs. Run the panel unattended across underspecified personas and the model fabricates objections no real buyer would raise. Keep the editorial checkpoint. Running AI on multiple inputs simultaneously without structured validation checkpoints produces fabricated output, not analysis errors
More reads, worse calls. Generating variant after variant without sharpening which read to trust loads execution onto a cheaper substrate while the judgment layer rots. Building costs collapsed; judgement didn't, the squeeze is on positioning, not production

What you get

A persona panel, each persona traced to verbatim voice-of-customer evidence.
A scored read per variant per persona across the seven dimensions, with the clarity cascade enforced.
The one-read takeaway and verbatim objections per persona, the highest-signal output.
A comprehension-gap list: where the message broke and the specific words that broke it.
An emotional-stakes read on high-ACV variants, separate from the rational score.
A predicted winner with a confidence number, recorded before any live result.
A calibration log: predicted winner versus live winner, tracked per persona over time.
A directional read, labeled as such, packaged inside the experiment or positioning packet.

Pressure-test a message on a synthetic buyer panel before you ship