Featured image of post From Vinny's Courtroom to Editor's Desk

From Vinny's Courtroom to Editor's Desk

Courtroom comedy to AI multi-agent review: using adversarial thinking to expose hidden weaknesses in documents.

The Pattern I Borrowed

In My Cousin Vinny, the prosecution thinks they have an open-and-shut case. Two boys, wrong place, wrong time, all the evidence points to them.

But Vinny doesn’t argue. He finds the gaps.

“How do you know the tire marks were from that car?” — untested assumption.

“How could these grits cook in five minutes?” — impossible claim.

And when he needs evidence, Mona Lisa Vito shows up with tire mark analysis and positraction differentials. Not opinions — verified data.

The prosecution wasn’t wrong because they were malicious. They were wrong because no one had actually tested their assumptions.

I saw this pattern: structured adversarial review surfaces hidden weaknesses.


Matching the Pattern

Later, I was writing audit documents. SOX compliance work. High stakes — errors could mean regulatory findings.

I could feel the blind spots. The sections that sounded good but weren’t. The claims that needed verification. The arguments that would get picked apart if the auditors pushed back.

I needed a mock trial.

So I built one:

Courtroom My Cousin Vinny Audit Review
Prosecutor District Attorney Challenger — finds gaps, bias risks, weak arguments
Defense Attorney Vinny Judge — balanced assessment, weighs strengths against issues
Expert Witness Mona Lisa Vito Fact-Checker — verifies claims against sources
Verdict Jury decision Synthesis — human-readable summary, P0/P1/P2 priorities

Same pattern. Different domain.

Over dozens of audit documents, this became my standard process. The pattern held.


Evolution: From Manual to Multi-Agent

At first, I ran it manually. Played each role myself. Switched hats.

“Okay, now I’m the challenger. What’s weak here?”

It worked. But it was slow. And I couldn’t fully separate the perspectives — my challenger knew too much about my judge.

Then I realized: this is what AI agents are for.

Different agents. Different roles. Different sessions. No shared conversation context. Each role sees only the original document, not the other agents’ responses. True adversarial review.

But not perfect. AI agents share training data. They may have the same blind spots, the same biases, the same gaps in reasoning. Independence from context doesn’t guarantee independence of perspective. A human still needs to judge the judges.

For more complex cases — where the challenger needs to see the judge’s assessment — you can chain them. But for most work, independence is the point. You want blind spots surfaced, not consensus built.

The workflow:

  1. Original text → Document to review
  2. Judge → Balanced assessment, scores 1-10, highlights strengths
  3. Challenger → Aggressive critique, finds gaps, bias risks, untested assumptions
  4. Fact-Checker → Verifies claims against external sources
  5. Synthesis → Human-readable summary, issues ranked P0/P1/P2
  6. Revise → Fix issues, then loop back to Judge for re-review

The loop matters. Fix, re-review, repeat until P0/P1 issues are resolved. P2s go to backlog.


Example: Before and After

Original text:

Studies show remote workers are 47% more productive than office workers. Without commute time and office distractions, people get more done in less time.

Companies save an average of $11,000 per employee per year on real estate and facilities.

Some leaders still resist. They cite “culture” and “collaboration.” But what they really mean is control.

Judge evaluation (7/10):

  • ✅ Strong thesis, clear rhetoric
  • ✅ Concrete metrics anchor arguments
  • ✅ Effective re-frame of opposition
  • ⚠️ P0: Productivity claim (47%) and cost savings ($11,000) lack citations
  • ⚠️ P1: Dismisses legitimate collaboration challenges as purely about control

Challenger critique:

“This piece is advocacy masquerading as analysis. It cherry-picks favorable statistics, ignores substantial evidence of remote work’s challenges, and dismisses legitimate concerns as bad faith. The ‘Resistance’ section is the most intellectually dishonest — it’s an ad hominem attack on leaders’ motives without evidence.

You cite ‘47% more productive’ — what study? What roles? Measured how? And what about the studies showing no difference or negative impact?”

Fact-Checker verification:

Claim Groundedness Rating Source
“47% more productive” 🔗 Chain-linked ⚠️ Partially Prodoscore (2020) — limited methodology, self-reported
“$11,000 savings” 🔗 Chain-linked ⚠️ Partially Global Workplace Analytics — estimate based on assumptions
“Leaders resist for control” 📰 Reported only ⚠️ Ungrounded Anecdotal, no primary source

Groundedness ratings:

  • 🔒 Grounded — Primary source (court filing, official record)
  • 🔗 Chain-linked — Secondary source with clear provenance
  • 📰 Reported only — News coverage without primary source link
  • ⚠️ Ungrounded — Originates from interested party or anonymous source

A claim can be “verified as reported” but still “ungrounded” — the source itself may be weak.

Synthesis (P0/P1/P2):

  • P0: Productivity claim needs proper citation and context
  • P0: Cost savings figure is estimate, not fact
  • P1: Dismissive framing of resistance ignores valid concerns

Revised text:

According to a 2020 Stanford study of 16,000 workers, remote employees showed a 13% productivity increase — not 47%, which comes from a Prodoscore analysis with different methodology and scope. Results vary significantly by role and industry.

Companies save an average of $11,000 per employee per year on real estate, according to Global Workplace Analytics — though savings depend heavily on location and lease structures.

Some leaders resist remote work for legitimate reasons: onboarding challenges, mentorship gaps, and reduced spontaneous collaboration. These aren’t just “control” concerns — they’re documented trade-offs that remote-first companies must address intentionally.


Behind the Scenes

The Roles

Judge (Balanced Review)

1
2
3
4
5
You are a Judge. Your role is balanced assessment.
- Score the document 1-10
- List top 3 strengths
- Identify issues ranked P0/P1/P2
- Provide one-sentence summary

Challenger (Devil’s Advocate)

1
2
3
4
5
You are a Challenger. Your role is aggressive critique.
- Find gaps in reasoning
- Surface bias risks
- Challenge untested assumptions
- Be uncomfortable. The author should feel exposed.

Fact-Checker (Source Verification)

1
2
3
4
You are a Fact-Checker. Your role is verification.
- For each factual claim, search for primary sources
- Rate: Verified / Partially / Unverified / Contradicted
- Flag fabricated claims or AI-suspect content

When It Works — And When It Doesn’t

This approach isn’t universal.

Works well for:

  • Factual claims that can be verified
  • Arguments with hidden bias risks
  • Documents with a clear thesis
  • Content that benefits from “fresh eyes”

Works less well for:

  • Highly specialized domains (agents lack expertise) — Workaround: Feed relevant research into the agent’s context before review. The agent still provides structured critique, just with better grounding.
  • Creative work where subjectivity dominates
  • Documents requiring shared context across roles
  • Time-critical reviews (multi-agent takes longer)

What can go wrong:

  • Over-aggressive Challenger kills good ideas — the critique exists to surface weaknesses, not to destroy. You still decide what to keep.
  • Bloating with counter-arguments — every critique doesn’t need a response. At some point, gains diminish. A document isn’t stronger because it anticipates every objection — it’s just longer.
  • Voice dilution — a simple blog post doesn’t need the same rigor as technical documentation. Over-review waters the original voice. The human reviewer tunes the intensity: SOX audit? Full treatment. Personal essay? Light touch.
  • Shared training biases — AI agents trained on similar data may share blind spots. They’re better than single-pass review, not magic.
  • Fact-Checker hallucinations — sometimes the verifier gets it wrong. Verify the verifier for critical claims.

The human in the loop matters. After synthesis, you decide: accept, reject, or iterate. The model surfaces issues; you own the judgment.


The Insight

The pattern wasn’t bound to its origin.

A courtroom comedy about a rookie lawyer → editorial review workflow.

Same shape. Different story.

Vinny’s cross-examination gave me the mental model. AI agents made it practical.


The Takeaway

When you see a pattern working, ask:

  1. What’s the structure? (Judge, Challenger, Fact-Checker)
  2. Where else does it fit? (Editorial review, code review, hiring decisions)
  3. How can I evolve it? (Manual → multi-agent, single pass → specialized modes)

Pattern recognition isn’t just seeing. It’s borrowing, adapting, and improving.


What’s Next

The same pattern applies to code review — but with different roles. Security Probe for vulnerabilities. Edge Hunter for boundary conditions. Reference Checker for dependency claims.

Different domain. Same structure.

See: Code Review Multi-Agent Pattern for the next application.


Try it yourself: The full skill — prompts, synthesis templates, and groundedness ratings — is available on GitHub. Works with any LLM.


A courtroom comedy, a cross-examination, an AI prompt. Same pattern. Different lens.