Most of what passes for AI prompt content on the internet has the same flaw. It teaches you how to make the model agree with you, faster. Better email. Better deck. Better strategy memo. The model says yes, you say thanks, you ship.
The problem is that "yes, faster" is exactly what you do not need from AI in the moments where AI is most valuable. The moments where AI is most valuable are the moments when you are about to do something expensive and slightly wrong, and a smart, well briefed peer would tell you so before you committed. That peer is rare. That peer is busy. And that peer, increasingly, is not in the room because the room is your kitchen table at 11pm and you are alone with a half written Notion doc and a sense that something is off but you do not know what.
The fix is not better prompts in the "make AI agree with me" tradition. The fix is the opposite. The fix is prompts engineered to make the model push back, find the hole, surface the assumption you forgot you were making, calibrate your confidence down to where it should actually sit. This is the CRITIC Framework. Six pillars, fifty prompts, one premise: an AI that only agrees with you is the most expensive yes man you will ever hire.
This guide is LLM agnostic. The prompts work in ChatGPT, Claude, Gemini, Mistral, and any model your company has approved. Use them when you are about to ship something that matters.
What is the CRITIC Framework
A method for adversarial AI prompting. Rather than asking the model to help you build the case for your position, the framework asks the model to systematically attack it. Each pillar attacks the position from a different angle.
- C: Challenge Premises. Question whether the question is even the right one.
- R: Risk Surfacing. Run pre mortems. Find what breaks before reality does.
- I: Interrogate Assumptions. Surface the things you forgot you were assuming.
- T: Test Logic. Stress test the argument for holes, gaps, and circular reasoning.
- I: Invert Conclusions. Steelman the opposite of what you believe.
- C: Calibrate Confidence. Force the model to put numbers on uncertainty.
Six pillars. The acronym is the framework. The framework is the practice. Run the prompts when stakes are real and ego is high. Run them especially when the model has been agreeing with you for an hour and you are starting to feel like a genius.
How to use this guide
Three principles. First, an adversarial prompt is only useful if you actually read the answer. The temptation, when AI tells you you are wrong, is to argue back, soften the prompt, and re run it until the model agrees. That defeats the entire point. Read the pushback. Sit with it. Second, the model can only attack what you give it. Paste the actual decision, the actual draft, the actual data. Generic input gets generic objections. Third, ask for the strongest version of the counter argument, not a reasonable one. "Give me the most ruthless steelman a hostile partner would deliver" gets you something usable. Where the prompt says [paste], be specific.
C: Challenge Premises
C1: The Premise Audit
Audit the premise of the decision below. List the five most important
assumptions baked into the question itself. For each, tell me what
would have to be true for the assumption to hold and what would
happen to my work if it turned out false. End by rewriting the
question I should actually be asking.
Decision: [paste]. Context: [paste]
C2: The Wrong Question Detector
I am about to spend significant time and money on the question below.
Steelman the case that I am asking the wrong question. Give me three
credible reframings a smarter operator might use. For each, name the
work I would do under that frame that I would not do under mine, and
vice versa. Question: [paste]
C3: The Five Hidden Why
Run a Five Whys on the goal itself, not the problem. Do not let me
get away with "because it is good for the business." Push until you
reach a specific person's specific outcome or a stated company value.
If you cannot get there in five rounds, tell me the goal is not
actually serving anything I can name. Goal: [paste]
C4: The "Compared To What" Audit
Before I commit to the option below, force me to compare it to the
three best alternatives I have not seriously considered, plus doing
nothing for six months. Score each against my criteria. Tell me which
criterion has been doing the most work to favour my choice.
Option: [paste]. Criteria: [paste]
C5: The Frame Substitution
Generate three completely different frames a sharp outside operator
might bring to my problem. For each new frame, write the problem
statement it produces, the solution set it opens, and what it costs
me to adopt versus my current frame. Do not be diplomatic. If one is
genuinely better, say so. Current frame: [paste]
C6: The Lazy Default Hunter
Identify the parts of my plan where I defaulted to the industry
standard answer or the answer my background biases me toward. For
each lazy default, propose the version a first principles operator
who had never seen this industry would write. Be specific.
Plan: [paste]. Background and industry: [paste]
C7: The "Argue the Opposite" Drill
Write the strongest possible essay arguing the exact opposite of my
position. Use the same data. The essay should be persuasive enough
that a smart third party would not immediately know which side I
endorse. End with the one piece of evidence that would tip a
reasonable person, and tell me whether it exists.
Position: [paste]. Data: [paste]
C8: The Stakeholder Question Translation
Translate the question my [boss / board / cofounder / customer]
asked. Tell me the three other questions they might actually be
asking, the political or emotional context underneath each, and how
my answer should change if any is the real question.
Question they asked: [paste]. Recent context: [paste]
R: Risk Surfacing
R1: The Pre Mortem
Fast forward 12 months. The plan below has failed badly. Write the
autopsy. Sections: what the failure looked like, the five most likely
causes ranked by probability, the warning signs at month one and
three, and the one decision I should make this week to reduce the
top two causes. Write the version that hurts. Plan: [paste]
R2: The Black Swan Generator
Generate ten low probability but high impact events that would
invalidate my strategy. Not the boring ones. For each, estimate
probability over 24 months, impact on my plan, and the cheapest
quarterly mitigation. Flag the two where the cheap mitigation is
disproportionately worth doing. Strategy: [paste]
R3: The Concentration Risk Hunt
Identify concentrations in my [business / portfolio / team]:
customer, revenue, supplier, talent, geographic, regulatory,
technical. Score each on likelihood and magnitude. Tell me the two
I am underweighting and the one I am overweighting. Recommend the
most efficient diversification move. Structure: [paste]
R4: The Unintended Consequence Map
Map the second and third order consequences of the change below. For
each direct effect, what shifts in the surrounding system. For each
shift, what shifts next. Flag any chain that ends in something the
original change was supposed to fix.
Change: [paste]. System context: [paste]
R5: The Single Point of Failure Audit
Identify every SPOF in my operation: people, systems, accounts,
vendors, integrations, keys, contracts, knowledge. Score on what
breaks if it fails today and recovery time. Recommend the three
worth removing this quarter and the one that is fine to live with.
Operation: [paste]
R6: The Slow Erosion Detector
Identify slow erosions that could degrade my metric or capability
over 12 to 24 months without setting off any single alarm. Each
small enough to dismiss on any given day but cumulatively material.
For each, tell me the specific thing I would look for in three months
to know it has started. Metric: [paste]
R7: The Worst Case Conversation
The decision below has worst credible outcome [paste]. Walk me through
the conversation I would have with [board / cofounder / spouse /
future self] in that scenario. Their first question, my first answer,
where it gets uncomfortable, what I wish I had thought about today
that I did not. Do not soften it. Decision: [paste]
R8: The "How Would We Know" Question
For the bet below and the reasons I believe it will work: for each
reason, write the specific signal I would expect in 30, 60, and 90
days if correct, and the signal if wrong. Flag any reason where the
wrong signal and the correct signal look the same in my current
data. Those are the dangerous ones. Bet: [paste]
I: Interrogate Assumptions
I1: The Assumption Inventory
List every assumption my plan relies on. Group into stated (written
down), operating (treating as fact without writing down), and
inherited (my industry or mentor told me are true). For each, rate
challenged or unchallenged. Highlight the three unchallenged that
would do the most damage if wrong. Plan: [paste]
I2: The "Why Do I Believe This" Drill
The single claim at the centre of my thinking is [paste]. Walk me
back through why I believe it. What evidence do I have. Where did it
come from. How recent. How representative. Has anything changed since
the evidence formed. Could I source equally strong evidence for the
opposite today. Verdict: evidence based or identity based.
I3: The Outsider Naive Question
Play an outsider with no industry context and no charity. Ask the
twenty most basic, naive, irritating questions they would ask in the
first five minutes. Questions that make experts roll their eyes. For
each, write the honest answer I would have to give. Flag the three I
struggle to answer clearly. Plan: [paste]
I4: The Survivorship Bias Audit
The evidence supporting my claim is [paste]. Audit it for survivorship
bias. What set of similar cases, projects, or experiments do I not
see in this evidence base. Why might they be missing. How would my
conclusion change if I could see them. End with the cheap research
move that would partially correct for the bias.
I5: The Reverse Causation Hunt
I believe [cause] leads to [effect]. Generate the three strongest
cases that causation runs the other way, or that both are driven by
an unseen third factor. For each, tell me what evidence would
distinguish your version from mine. Verdict on how strong my causal
claim actually is once you consider alternatives.
I6: The Definition Audit
List every important word in my document that has more than one
common definition. For each, tell me which I am using, which the
reader is most likely to default to, and where the ambiguity could
cause a different conclusion than I intend. Rewrite the three
highest stakes sentences to remove ambiguity. Document: [paste]
I7: The Generalisation Test
I am applying [pattern] to my situation because it worked in [other
context]. Test the generalisation. What was true about the original
context that I am quietly assuming is also true about mine. Tell me
which conditions hold, which are unclear, which are demonstrably
absent. Verdict: apply, adapt, or abandon.
I8: The Counterfactual Run
Run the counterfactual on the success I am crediting to [cause]. If
that cause had not existed, what is the realistic range of outcomes I
might have seen anyway. Use base rates. Recalibrate how much I should
attribute to the cause versus baseline, luck, or contemporaneous
factors. End with a percentage estimate.
T: Test Logic
T1: The Argument Map
Map my argument. Sections: the claim, supporting sub claims, evidence
for each, and the unstated links between them. For each link, tell me
whether it is deductive, inductive, or a vibe link that sounds
connected but is not. Highlight every vibe link. Those are where the
argument breaks. Argument: [paste]
T2: The Circular Reasoning Detector
Find every place in my argument where a conclusion is being used as
evidence for itself, possibly with different wording. Quote the loop.
For each, rewrite to break the circle, either by sourcing fresh
evidence or by acknowledging the circularity and downgrading the
claim. Argument: [paste]
T3: The Evidence Quality Grader
Grade each piece of evidence supporting my position on type (data,
anecdote, expert opinion, intuition, authority), recency, sample
size, and how easy it would be to find equally credible evidence for
the opposite. Output as a table sorted by quality. End with the piece
doing the most work and whether it deserves that weight.
T4: The Strawman Audit
Steelman the opposing position I described as a strawman. Write the
strongest version using evidence I have not engaged with. Then rewrite
my original argument to defeat the steelman, not the strawman. If my
argument cannot defeat the steelman, say so directly.
Opposing view as I described it: [paste]. My argument: [paste]
T5: The Selection Bias Audit
Audit the sample behind my data. Where did it come from. Who or what
is missing. What systematic process produced the dataset, and what
could that process filter out. Verdict on whether my conclusion is
supported by the data I have or by the data I happen to have.
Data: [paste]
T6: The Argument Inversion Test
Apply the same logical structure of my argument to a different domain.
If my argument is "we should do X because A, B, C," try the same A,
B, C on a clearly absurd X. If the structure produces absurd
conclusions in adjacent domains, the structure is suspect, not the
domain. Tell me how my argument fares. Argument: [paste]
T7: The "What Would Change My Mind" Question
I currently believe [paste] with high confidence. Generate the three
specific pieces of evidence that would change my mind. They must be
concrete, observable, findable. Tell me whether each currently exists,
partially exists, or has not been investigated. Flag any where I have
actively avoided looking.
T8: The Logical Fallacy Sweep
Sweep my argument for ad hominem, appeal to authority, appeal to
popularity, false dichotomy, sunk cost, anchoring, motte and bailey,
equivocation, and any others. For each, quote the section and rewrite
preserving the underlying claim without the fallacy. If the claim
cannot survive the rewrite, say so. Argument: [paste]
I: Invert Conclusions
I9: The Best Case for Stopping
I plan to continue investing in [paste]. Make the best possible case
for stopping completely, not slowing down. Use the strongest evidence.
Be specific about what we do with freed resources and what we lose.
End with the one condition that would have to be true for the stop
case to be right, and whether it holds.
I10: The Best Case for Doing Nothing
Status quo: [paste]. I am pushing for [change]. Make the best case
for doing absolutely nothing. Why might inertia actually be correct
here. What costs am I underweighting on the change side. What boring
benefits of the current state am I dismissing. Verdict on whether the
change is worth the disruption.
I11: The Best Case for the Competitor
I believe [competitor] is wrong about [topic] and we are right.
Steelman their position. Write the strongest version of why they
might actually be right and we might be missing something. Reference
their team, capital, customer access, history. End with the one fact
about them I am most likely dismissing.
I12: The Reverse Roadmap
Build the inverted roadmap: the exact opposite of my plan, item by
item. For each inverted item, write the case for why a smart
competitor might pursue it. Tell me which inverted items genuinely
deserve consideration and which are correctly excluded.
Roadmap: [paste]
I13: The Anti Hero Persona
Define the anti persona: the customer who looks similar on the
surface but for whom my product is actually wrong. Be specific. Their
context, pain, alternatives, the moment they discover my product does
not fit. End with the warning signs in my current funnel that I am
accidentally selling to the anti persona. Target persona: [paste]
I14: The "Do the Opposite" Half Day
For the next half day, I am going to deliberately do the opposite of
my instincts on [topic]. Help me plan it. What would the opposite
schedule look like, what would I say instead, what would I notice
that my usual approach hides. End with the experiment to compare the
two approaches honestly.
I15: The Inverted Pitch
Write the inverted pitch: same product, same data, but framed as if
I were trying to talk a prospect out of buying. Lead with the cases
where it is the wrong fit. Tell me which of these "do not buy" cases
I am encountering in real sales more often than I am admitting.
Pitch: [paste]
I16: The Hostile Inheritance Test
Imagine someone you respect inherits my [strategy / business] tomorrow.
No loyalty to anything I built, no political cost for change. Walk
through what they cut in week one, what they double down on, what
they leave alone. Be honest. Where would they call my decisions
sentimental rather than strategic.
I17: The Opposite Headline
The headline I want for [project / launch / year] is [paste]. Write
the opposite headline that would be written if everything went wrong.
Tell me which of the two is currently more probable given my actual
execution, and what specific work this quarter would shift probability
toward the one I want.
C: Calibrate Confidence
C9: The Calibration Audit
For each significant claim in my plan, give me the percentage
probability you would assign that the claim is true, given the
evidence I have provided. Tell me where my plan implicitly treats a
60 percent claim as if it were a 95 percent claim. Those are the
joints where the plan is most likely to crack. Plan: [paste]
C10: The Confidence Interval Forcing Function
I expect [prediction] by [date]. Force me to state a confidence
interval. 10th percentile (much worse), 50th (expected), 90th (much
better). Tell me whether my current plan would survive the 10th
percentile outcome. If not, the plan is overfitted to the expected
case.
C11: The Track Record Check
I am claiming I have a reasonable read on [domain]. Audit the claim.
What are my last five predictions in this domain. How did they turn
out. What does my hit rate suggest about weight I should put on my
current prediction of [paste]. Adjust my stated confidence accordingly.
Recent predictions and outcomes: [paste]
C12: The Reference Class Forecast
Build a reference class forecast for my estimate, not an inside view.
Find the closest five comparable projects, their actual outcomes, the
distribution. Tell me where my inside view sits in that distribution.
End with the realistic outside view estimate.
Project: [paste]. Current inside view: [paste]
C13: The Overconfidence Diagnostic
Highlight every word in my plan that signals more confidence than
evidence supports. Will, clearly, obviously, definitely, undoubtedly,
always, never, simply. Suggest the calibrated alternative for each.
Tell me whether my resource decisions match the original confidence
language or the calibrated one. Document: [paste]
C14: The Decoupled Decision
Decompose the decision below from one bet into the sequence of
smaller bets it actually contains. Estimate the probability of each
and the conditional probabilities that link them. Multiply through to
get the joint probability of the whole working as planned. Compare to
my felt confidence. Decision: [paste]
C15: The "What Does Smart Money Think" Question
On the question of [paste], what is the position of the smartest
people I respect who have publicly weighed in? If they disagree with
me, what do they know that I do not? If they agree, are they
downstream of me intellectually (same sources, mentors, priors)? Is
agreement independent confirmation or echo chamber social proof.
C16: The Time Inconsistency Check
Imagine my future self three years from now, having lived through the
consequences of my decision today, writing a note back to me. What
does the note say. Where does future me think I was right to act
decisively. Where does future me think I should have paused, sought a
second opinion, or delayed. The one piece of advice future me would
underline. Decision: [paste]
C17: The Asymmetry Audit
Audit the payoff asymmetry of my decision. Upside if right. Downside
if wrong. Multiply each by my honest probability of each outcome. Is
this a positive EV bet or am I selling against the tail. Verdict on
whether worth making at my current probability, and what the
probability would have to drop to before the bet stops making sense.
Decision: [paste]
The CRITIC Framework in one image
C R I T I C
Challenge Risk Interrogate Test Invert Calibrate
Premises Surface Assumptions Logic Conclude Confidence
The order matters. Premise comes first because if the premise is wrong, every downstream check is wasted. Calibrate comes last because once you have stress tested everything else, the remaining job is to size your commitment to the calibrated confidence, not the original conviction.
How to combine CRITIC with your model of choice
ChatGPT: o series reasoning models are the right call for Test Logic and Calibrate Confidence. They trace through the chain step by step. 4 series models work for Challenge Premises and Invert Conclusions where the bottleneck is generating alternative framings.
Claude: works well in a Project with the memo, data, or plan loaded into project knowledge. Claude tends to be the most honest when asked to push back, and the least likely to retreat into "both views have merit" mush. Push it hard. Tell it you want the version a senior partner would deliver in private, not a public review.
Gemini: deep research mode is useful for Risk Surfacing and Reference Class, where value comes from pulling external comparables rather than internal reasoning. Watch for context drift on the longest adversarial prompts.
Cross model habit: run the same CRITIC prompt across two models. Where they agree, the critique is robust. Where they diverge, the divergence itself is information about how shaky the underlying position is.
Frequently Asked Questions
Why would I want AI to argue with me instead of help me?
Because the value of AI as a thinking partner is in the moments when it tells you something you do not want to hear, briefly, before you commit. AI that only agrees with you is a faster version of your own thinking, which means you compound your own blind spots faster. AI that pushes back is the cheapest second opinion you will ever access, available at 11pm when no second opinion otherwise exists.
Does adversarial prompting actually change AI behaviour or just the tone?
It changes behaviour, not just tone, when the prompt is structured correctly. Asking "play devil's advocate" produces hedged, polite objections. Asking the model to write the strongest essay against your position, to estimate the probability you are wrong, or to surface specific assumptions, produces material critique. Specificity unlocks substance.
What is the best AI model for adversarial prompting?
No single best. Claude tends to be the most honest about pushing back without retreating into false balance. ChatGPT o series models are strongest at finding logical holes and circular reasoning. Gemini is strongest at pulling external reference cases. Most users of the CRITIC Framework run the same prompt across two models and read both outputs side by side.
Can I use these prompts on documents with confidential data?
Check the data retention policy of your model tier. Enterprise tiers of ChatGPT, Claude, and Gemini offer data retention controls that allow internal company data. Consumer tiers usually do not. For high stakes adversarial prompting on confidential material, use an enterprise tier or anonymise before pasting.
How is the CRITIC Framework different from "act as a devil's advocate"?
"Act as a devil's advocate" is a tone instruction. The model takes on the costume and produces a hedged, balanced objection. The CRITIC Framework is a set of specific attacks: assumption surfacing, logical hole testing, calibration forcing, inversion. Each attack produces a different type of insight. You pick the attack that fits the stage of thinking you are in.
Is there a paid version of these prompts?
Yes. The fifty prompts above are the free version, in compact form. The CRITIC Pro Pack includes 150 expanded adversarial prompts with example outputs, ready to load Claude Projects and custom GPTs configured to push back automatically, and role specific adversarial sets for founders, product managers, operators, and investors. The Pro Pack is on the PromptLeadz Pro Collection at $29.
Where to go next
The CRITIC Framework sits alongside the role and function playbooks in the PromptLeadz Free Vault. Founders pair CRITIC with the FOUNDER Framework. Product teams pair it with 7P. Operations teams pair it with OPS7. For the conversations across the table, pair it with HARDER. CRITIC is the second pass. The role frameworks are the first.
The thing to internalise is that the model is not your enemy when it pushes back. The model is the friend who tells you the truth at 11pm before you press send on the email you should not send, or before you commit the capital you should not commit. Train it to do that job well. Then listen.
PromptLeadz publishes battle tested AI prompt packs for founders, product, sales, marketing, operations, HR, finance, customer success, adversarial thinking, and hard conversations. All prompts are LLM agnostic. Pricing is in USD.
Deja un comentario: