Not Even Wrong

The Problem With P(doom)

Feb 27, 2026

*The Fall of the Rebel Angels* by Pieter Brueghel the Elder (1562)

John Graunt was a merchant who lived in 17th century London. In 1662, in a city familiar with plague, Graunt consulted parish death records to figure out how long Londoners could expect to live. He could tell you that roughly 36 out of every 100 children born in the city would die before their sixth birthday. This probability could be checked by anyone willing to do the same sums using the same sources.

Three and a half centuries later, AI researchers are calculating a morbid probability of their own. They wonder about thinking machines seeing off the human race, and refer to the likelihood of that possibility as P(doom). The term began life as an inside joke in rationalist spaces in the closing years of the 2000s, but shot to prominence after researchers signed an open letter about the threat posed by AI development in 2023. In the AI world, everyone has their number. Yudkowsky puts it at roughly 95–100 percent. Geoffrey Hinton sits around 10–20 percent, while Marc Andreessen reports a bullish 0 percent (a logical impossibility because you cannot rule out something that has never happened and whose conditions have never been tested). The mean estimate among surveyed AI researchers is about 14 per cent.

We have some who believe that AI is almost certain to wipe out humanity and others who ascribe a 0 per cent probability to the same outcome. This is curious for a supposedly hard-headed prediction made by some of the world’s most credentialed scientists. Imagine a roomful of cardiologists who, given the same scans and the same patient history, exposed to the same training and the same clinical standards, disagreed on whether the probability of heart failure was 0.1 percent or 99 percent.

You would not conclude that this was a hard problem on which reasonable people differed. In fact, you would probably conclude that at least some of our medics may need to visit neurology down the hall. With P(doom), the estimates from AI researchers range from 0 percent to 99 percent (and a few nines then some). When someone tells you they assign a 15 per cent probability to AI wiping out humanity, it is worth asking what kind of claim they are actually making. It sounds like a statement about the world. It is not.

To be clear: my objection is not to probabilistic reasoning, which is straightforwardly indispensable in domains where estimates can be calibrated against outcomes over time. My problem concerns what we might call proximate and ultimate formulations. Proximate estimates for COVID were things like the hospitalization rate and case severity, while an ultimate estimate was the infection fatality rate. For P(doom), proximate estimates include the conditions under which models exhibit deceptive behavior and the observed failure rates of current alignment techniques. Here, the ultimate estimate is the thing itself.

Someone might know that current models can be made to exhibit deceptive alignment under laboratory conditions or that existing alignment techniques have specific failure modes. But to cram those findings into a container labeled “15% chance of extinction” involves a series of judgments about unprecedented transitions for which there is no model connecting proximate estimates to their ultimate counterpart. Knowing that fine-tuning on specific tasks can induce broader misalignment tells us little about how well alignment solutions will generalize or how geopolitical actors will respond. Compare with COVID, where hospitalization and case-severity data fed epidemiological models that could be tested against actual deaths.

A natural objection is that some people are good at formulating predictions in lots of different domains, so perhaps their assessment of P(doom) ought to carry weight. After all, some superforecasters assign 17 per cent to various events and see those events happen roughly 17 per cent of the time. You can check a forecaster’s guesses across many different predictions in such a way as to mean we’re dealing with a property of the forecaster’s overall policy rather than of any individual estimate. When a superforecaster says 17 per cent for a ceasefire in some conflict, that number is useful because they also say 17 per cent for hundreds of other things and roughly 17 per cent of those things happen.

Superforecasters earn their calibration (that is, the extent to which confidence lines up with their track record over time) using predictions with short time horizons, a history of similar events to draw on, and a mechanism that allows the forecaster to correct course when needed. For P(doom), there is no source of comparable predictions against which to calibrate. A forecaster might be superbly calibrated on elections and geopolitical high drama, but that tells us nothing about whether their number for an unprecedented event is any good.

What I Talk About When I Talk About P(doom)

There are two influential interpretations of probability: subjective and objective. Where the former deals with how confident you are in a claim, the latter stresses what actually happens when you test it. Anyone who has spent even a little time thinking about the reliability of P(doom) metrics probably has a sense that the calculations ought to be taken with a pinch of salt. Many in rationalist or rationalist-adjacent circles are well aware of the difference between these two accounts, but lots of people who absorb these views are not.

That goes for those who adopt the terminology to signal their membership of the in-group, and those who hear talk of P(doom) on the news and wonder why one of the world’s most cited researchers thinks there’s a one in five chance that AI will wipe out the human race. When a chief global strategist at one of the largest investment research firms tells CNBC it’s 50/50 whether AI destroys humanity by mid-century, the average viewer — incorrectly but understandably — assumes that figure estimates the likelihood that it will happen.

Every single P(doom) you hear is a subjective probability. It is a measure of the degree of rational belief one holds in a proposition given the available evidence — a property of an agent’s epistemic state rather than of the world. When you say “I think there’s a 50 per cent chance this meeting will be a waste of time,” you’re not drawing on a frequency table of past meetings. You’re expressing how confident you feel given what you know about the agenda and who’s attending. All the framework requires is that your degrees of belief are consistent with one another. Beyond this internal consistency, there is no further requirement that your probabilities correspond to the shape of reality.

If your beliefs are internally coherent, you can multiply your probability for any outcome by the value of that outcome — however you choose to measure it — and arrive at its expected value, a single number that tells you how good or bad a bet looks on average. From there you can derive a utility function and compare any set of actions on a common scale. The framework of cause prioritization, wherein resources are allocated to whichever cause produces the greatest expected good per dollar, inherits this apparatus. This approach is popular with rationalist-adjacent communities, especially Effective Altruism, because it transforms questions of moral life into properties that can be measured and optimized.

Objective accounts of probability take a different approach. Here probability is what actually happens in the world when you repeat something many times. This species of probability is a physical property of systems, one that is empirically testable. If you flip a coin a thousand times roughly half will land heads, so we can say the probability is that ratio. The same logic applies to complex phenomena, like radioactive decay, where about half the atoms in a given sample will decay within the half-life period (and you can go and check).

The subjective interpretation is internally coherent and mathematically elegant. But coherence should not be confused for empirical content. My objection to P(doom) flows from the simple fact that subjective probability claims are, by their nature, unfalsifiable. That’s generally fine in many instances — like casual everyday judgments or betting on the Super Bowl — but not when the stakes are high enough to redirect billions of dollars and make or break government policy. Statistical reasoning about unlikely but potentially devastating scenarios is obviously useful, but only when properly grounded in claims we can falsify.

Your Problem Too

For a probability to tell you about something other than the speaker’s state of mind, it needs a collection of similar events from which a ratio can be drawn. To say that a coin lands heads 50 percent of the time, we need a set of coin flips to draw from. This is our reference class.

Clearly there is no reference class for human extinction. As far as every AI researcher in the world knows, the event is singular and unprecedented. Traditionally, the reference class problem has been associated with the objective interpretation of probability. This is because objective probability is calculated according to the ratio of how many times something happened out of how many times it might have happened. If the event has never happened and can’t be repeated, then there’s nothing to compute.

But the reference class problem isn’t only relevant for objective accounts of probability. Alan Hájek famously made the case that it affects every interpretation of probability when applied to singular events. As he put it, “the reference class problem is your problem too.” The argument goes something like this: before you can assign a probability to a one-off event, you have to decide what kind of event it is. Is AI-caused extinction a case of “new technology going wrong,” “species-level catastrophe,” or “unprecedented transition in the nature of intelligence”? Each framing suggests a different probability, and there is no principled way to choose between them. The number you arrive at depends on how you describe your target.

The way to tell a useful probability from a useless one is to ask whether the world can correct it.

This is because good explanations resist variation. If Graunt had doubled his estimate of childhood mortality, the parish registers would have corrected him. If an epidemiologist in early 2020 had put the infection fatality rate at 10 per cent rather than 1 per cent, the incoming hospital data would have shown the estimate to be wrong. The explanation is enmeshed with the world in such a way that it cannot be led to wherever you would like it to go. With P(doom), you can swap the causal pathway from deceptive alignment to resource competition and inflate the number from 5 per cent to 7.5 per cent. The estimate can contort to accommodate any figure and any causal chain because the only constraint it faces is internal coherence.

In the context of P(doom), defenders might say something like “We don’t need a reference class because we’re expressing a degree of personal belief.” This is perfectly coherent on the subjective view, but it is also the move that severs the claim from its empirical content. If your probability doesn’t need to correspond to any feature of the world, then it can’t be wrong about the world. And a claim that can’t be wrong about the world tells you nothing about it.

We might call the result a Gettier probability, a credence that is internally justified — and that might even correspond to reality — but that lines up with the world through luck rather than judgement. In epistemology, a Gettier case is a belief that is true and justified but not true because of its justification. Any P(doom), where the justification and the truth are aligned, will (assuming the estimate was correct) have the same structure. A researcher’s estimate might happen to match whatever the actual risk turns out to be, but the match would be accidental relative to the method.

Gettier probabilities show up everywhere, like when a consulting firm estimates that a given market will grow at 7.3 per cent and lo and behold they are correct. The same is true for a geopolitical risk score that looks precise but cannot be updated by any observation short of the catastrophe it purports to predict. P(doom) is distinctive only in that the stakes are high enough that the probability escapes containment and ends up on the morning news.

Suppose someone’s P(doom) is 30 percent. What outcome would show this was wrong? If AI goes well, they get to enjoy the benefits of the 70 percent figure. If it goes catastrophically — and anyone is still alive to update their priors — the 30 percent gets paid off. This is one of the more frustrating features of the subjective framework, which allows pretty much anyone with an idea about AI risk to have their tokens and eat them.

Bayesian probability demands that credences be precise. When estimates range from 0% to 99% with no mechanism to adjudicate, the framework is not being applied so much as invoked. P(doom) manages the impressive trick of satisfying neither the Bayesian standard (precise credences that can be adjudicated between) nor the falsifiability standard (empirical claims that can be proved wrong).

I’m not arguing that we ought to stop reasoning about novel events. Subjective probability is useful because it helps us think under uncertainty without good data. Some of the best calls in recent AI history were made by researchers who formed strong priors about scaling laws and bet on them before the evidence was in. Those bets were vindicated because they were testable. Scaling laws either held or they didn’t; capability thresholds either arrived on schedule or did not. My concern is that P(doom) borrows the confidence of claims like these without sharing their vulnerability to refutation.

I Am Sure You Are Very Sure

Forget, for a moment, whether P(doom) can be tested. Consider instead whether it is the kind of thing that can be meaningfully quantified in the first place. In 1921, the economist Frank Knight drew a distinction between risk and uncertainty. Risk is what you face when you can calculate the odds as with flicking a small ball onto a roulette wheel. Uncertainty is what you face when you can’t, even in principle, because the situation is too poorly understood to yield a number.

Knight argued that the two are different in kind.

The Bayesian tradition’s response to Knight was to argue that subjective probability does away with the distinction: if you can always express a degree of belief, then there is no situation that resists quantification. P(doom) is a textbook application of this move, but the dissolution works only if the resulting number can be tested via real-world feedback. For P(doom), the Bayesian move converts Knightian uncertainty into a figure that has the form of risk without any of the mechanisms that make risk estimates trustworthy.

The same process, where uncertainty becomes risk, happens wherever a decision-making framework demands a numerical input and none is available. Firms routinely assign precise probabilities to market scenarios that are genuinely unprecedented. They don’t have any evidence that warrants a specific number, but they do have a spreadsheet with cells that need filling. The defense that such figures are “rough heuristics” does not help. The problem is that they are treated simultaneously as a casual shorthand and as inputs into value calculations depending on whatever is most convenient. This is why the “shmrobability” move doesn’t fix our problem. It concedes we’re not really talking about probability, while trying to preserve its practical content. Either the number is a rough heuristic and we should stop plugging it into expected value calculations, or it is a serious input to policy and we should hold it to the standards that implies.

If P(doom) is a subjective credence, there is no principled way to adjudicate between two people who disagree. A P(doom) of 95 per cent and one of 50 per cent are both internally coherent insofar as neither violates any rule of the subjective framework. You can be confident in your P(doom) and I can be confident in mine, and there is no observation or outcome that could settle which of us is right. Attempts to resolve this from within the Bayesian model, from Aumann’s agreement theorem onwards, all founder on conditions that do not hold (like shared starting points, perfect rationality, or common knowledge of each other’s beliefs).

The problem compounds when these estimates enter expected value calculations, where even a vanishingly small probability multiplied by the stakes of human extinction produces numbers large enough to dominate decisions. The rationalist and rationalist-adjacent communities recognize this dynamic as Pascal’s mugging, while groups like GiveWell have cautioned against taking such estimates at face value.

The reader should not take from this essay that AI risk is unworthy of attention. Rather, my point is that P(doom) is the wrong epistemic instrument for thinking about it. The alternative is to formulate falsifiable conjectures about the world and then try to refute them, which is exactly what AI safety researchers do in their actual work. They propose that particular training regimes will produce deceptive behavior and they predict that models beyond a certain capability threshold will resist correction. They hypothesize that misalignment in one domain will generalize unpredictably to others, and they can test this by measuring the transfer of learned strategies across task distributions. They ask whether models that pass safety evaluations in deployment conditions will behave differently when those conditions change. Each of these lines of research produces claims that can fail, and this vulnerability is what makes them worth paying attention to.

Individual claims about deceptive behavior or capability thresholds do not by themselves tell you how much to spend on alignment versus pandemic preparedness, yet real institutions have treated P(doom) as though it could. The effective altruist community has directed hundreds of millions of dollars toward AI existential risk reduction. Open Philanthropy, the field’s largest funder, used subjective probability frameworks to weigh AI risk against global health, biosecurity, and farm animal welfare. Whether or not that assessment was correct, the reasoning behind it is only as good as the numbers it relies on.

A figure that ranges from 0 per cent to 99 per cent depending on who you ask and with no way to adjudicate between them is about as useful as no number at all. Aggregating these estimates might seem to help inasmuch as a mean of 14 per cent looks more sober than bouncing between extremes. Aggregate or individual, the problem is that neither incur a cost for being wrong. A bookmaker’s odds are constrained by the market just as an insurer’s premiums are shaped by the frequency and nature of claims made. In both cases, the issuer’s assessment improves over time because they suffer when it is incorrect. There is no market to punish a mispriced P(doom) and no settlement date on which the estimate is tested.

We might say something like: “Even if P(doom) is imprecise, it’s still better than no number because we need to allocate resources somehow.” But here we’re assuming that internal coherence still outperforms the absence of a number when you have to act. We can live with imprecision under these conditions, but not at the expense of claims that cannot be falsified that get steamrolled by the headline rating.

Governments fund pandemic preparedness efforts without a big round number for the probability of the next pandemic. Institutions allocate resources under genuine uncertainty all the time by doing things like funding a portfolio of approaches, setting thresholds for observable harms, identifying the cheapest reversible interventions, demanding observable milestones before scaling commitment, and building the capacity to pivot as new evidence arrives.

The physicist Wolfgang Pauli was famously unforgiving of bad theory. When a colleague asked him to assess a young physicist’s paper, his verdict was that it was “not even wrong.” He meant that its central claim was so removed from evidence that it could not be proved false. P(doom) is not even wrong. Graunt’s mortality tables may have been technically crude, but they were accountable. They could be checked against next year’s parish records, and if they were wrong, the records would show it.

Richqard

"This is the gap OCO was built to fill. Not P(doom) the falsifiable proximate data that makes P(doom) a less necessary fiction. Contested signal. Longitudinal. Consequence bearing. The map of where humans actually converge and where they genuinely don't after rigorous filtration. Worth a conversation?"

Cosmos Institute

Discussion about this post

Ready for more?