ary/index_
ary/ notes/ minimum-description-length

Minimum description length

growing planted 2026-04-02 tended 2026-04-14 189 words #math #ml

Minimum description length (MDL) is a practical approximation of kolmogorov-complexity, trading exactness for computability. The principle: the best model for some data is the one that, together with the data encoded using the model, gives the shortest total description.

Written more carefully, for model MM and data DD:

MDL(M,D)=L(M)+L(DM)\text{MDL}(M, D) = L(M) + L(D \mid M)

where L()L(\cdot) is the code length in bits. Minimising this over models trades off fit (L(DM)L(D \mid M) small) against model complexity (L(M)L(M) small). It is equivalent, asymptotically, to Bayesian model selection with a universal prior.

why it’s useful

Unlike Kolmogorov complexity, MDL actually picks a model family up front — say, polynomials up to degree kk — and then optimises. This makes it tractable. The two-part code (L(M)+L(DM)L(M) + L(D|M)) maps cleanly onto common ideas:

  • parameters vs. residuals in regression
  • depth vs. training error in decision trees
  • complexity penalty in AIC/BIC, which are first-order approximations

It also connects to Shannon entropy: L(DM)L(D \mid M) is, at its cleanest, the cross-entropy of the model against the data.

See also occams-razor-as-a-prior — MDL is Occam’s razor, made precise.

see this note in the graph view