Occam's razor as a prior

“Entities should not be multiplied beyond necessity” is a slogan. It is not an argument. The real content of Occam’s razor shows up when you formalise it, and the formalisation is a prior probability distribution over hypotheses.

Solomonoff induction does this explicitly: assign every computable hypothesis $h$ a prior proportional to $2^{-K(h)}$ , where $K(h)$ is its kolmogorov-complexity. Simpler hypotheses (shorter programs) get more prior mass. Given data, Bayes’ rule updates.

This is beautiful because it dissolves a class of philosophical debates. Should we prefer simple theories? becomes a question about whether a specific prior matches our observed evidence — a question we can in principle test.

It also explains why the razor sometimes fails. If the generative process is genuinely complex, the data eventually beats the prior, and we end up believing the complicated thing. Early in evidence gathering we over-weight simplicity; with enough data, reality asserts itself.

why i care

Writing out an unstated Occam-prior forces you to notice it. Most scientific disputes that masquerade as “one theory is more elegant” are really disputes about whose prior is doing more work.

Minimum-description-length is the computable version of the same idea. The two converge in the infinite-data limit.