Occam's Razor: Definition and Why the Simplest Explanation Tends to Be Right

Definition

Occam's Razor is the principle of parsimony: when two competing hypotheses account for the same evidence equally well, the simpler one should be preferred. Formalised by the 14th-century philosopher William of Ockham, it guides theory selection in science, medicine, and everyday reasoning by favouring the hypothesis that requires the fewest unsupported assumptions.

The principle takes two distinct forms: ontological parsimony (prefer fewer kinds of entities) and quantitative parsimony (prefer fewer instances of any one kind).

How it works

At its core, Occam's razor holds that explanatory entities must not be multiplied beyond necessity: given competing hypotheses with equal predictive power, prefer the one requiring the fewest independent assumptions ¹. This is not a claim that reality is simple; it is a claim about evidential warrant. Complexity that does not improve predictive accuracy is complexity that lacks justification.

Bayesian model comparison provides formal mathematical grounding for the razor. Models with more free parameters occupy a larger volume of parameter space, making any specific observed outcome less probable under them; the mathematics penalises this excess automatically, a phenomenon known as the Bayesian Occam's razor ¹ ². Human cognition mirrors this pattern. In preregistered behavioural experiments, participants systematically favoured lower-complexity explanations when both fitted the observed data equally well, with preferences matching formal Bayesian model-selection predictions ².

The razor appears in at least two distinct forms. Ontological parsimony demands fewer kinds of entities; quantitative parsimony demands fewer instances of any one kind ⁴. These forms do not always point in the same direction. In clinical medicine, parsimony underlies the preference for a single unifying diagnosis over a conjunction of independent conditions, though this must be balanced against the recognition that patients frequently present with multiple concurrent problems ¹.

Example

A senior analyst reviews two models explaining a drop in customer retention. The first posits a single shift in competitive pricing; the second adds three concurrent factors: seasonal variation, a minor product update, and demographic drift. Both models fit the available data equally well. Under Occam's razor, the single-factor model is the better starting point: its prediction of recovery can be tested against the next period's data with far less ambiguity.

When multiple explanations fit the same facts, the simpler one generates predictions that are more falsifiable and therefore more useful for guiding action.

Why it matters

Parsimony preferences are adaptive when data are sparse. Overfitting to noise, by encoding accidental features of a sample as genuine causal structure, degrades a model's ability to predict new data. Occam's razor functions as a cognitive and statistical safeguard against this failure mode: in both human reasoning and machine learning, its violation predicts poor generalisation ². For practitioners who make decisions under uncertainty, the principle is not a licence to oversimplify but a prompt to justify every assumption that exceeds what the evidence supports.

Applied without discipline, the razor misleads. Domingos demonstrated empirically that simpler models do not consistently outperform more complex ones in predictive accuracy across knowledge-discovery tasks; parsimony is better understood as a heuristic for comprehensibility than as a universal guarantee of correctness ³. The scope of parsimony also varies by domain: the same evidence base can support conflicting parsimony arguments depending on which type of causal structure is being minimised ⁵. The razor is best treated as a burden-of-proof principle, not as a proof.

Is Occam's razor a proven scientific law or just a guideline?+

Occam's razor is a methodological heuristic, not a proven law. It offers no guarantee that the simplest hypothesis is true. Its value lies in burden-of-proof allocation: complexity beyond what the evidence demands carries an implicit cost, and the razor makes that cost explicit.

How does Occam's razor relate to Bayesian statistics and model selection?+

In Bayesian model comparison, the razor emerges as a mathematical consequence rather than an assumption. Models with more free parameters are automatically penalised by the likelihood calculus, because they spread probability mass across a larger space of possible outcomes. Jefferys and Berger formalised this as the Bayesian Occam's razor.

Does simpler always mean more accurate? When does Occam's razor fail?+

Not always. Domingos showed that simpler models do not reliably outperform complex ones in predictive accuracy across knowledge-discovery tasks. In complex systems, parsimony can pull in conflicting directions depending on what quantity is being minimised. The razor selects the better starting hypothesis, not necessarily the true one.

How should Occam's razor be applied in everyday decisions?+

Treat it as a burden-of-proof principle: any assumption beyond what the available evidence supports carries a cost that requires justification. When diagnosing a problem or evaluating a plan, prefer the explanation that requires the fewest additional assumptions, then gather more data before adding complexity.