Minimum description length

Minimum description length (MDL) is a practical approximation of kolmogorov-complexity, trading exactness for computability. The principle: the best model for some data is the one that, together with the data encoded using the model, gives the shortest total description.

Written more carefully, for model $M$ and data $D$ :

$\text{MDL}(M, D) = L(M) + L(D \mid M)$

where $L(\cdot)$ is the code length in bits. Minimising this over models trades off fit ( $L(D \mid M)$ small) against model complexity ( $L(M)$ small). It is equivalent, asymptotically, to Bayesian model selection with a universal prior.

why it’s useful

Unlike Kolmogorov complexity, MDL actually picks a model family up front — say, polynomials up to degree $k$ — and then optimises. This makes it tractable. The two-part code ( $L(M) + L(D|M)$ ) maps cleanly onto common ideas:

parameters vs. residuals in regression
depth vs. training error in decision trees
complexity penalty in AIC/BIC, which are first-order approximations

It also connects to Shannon entropy: $L(D \mid M)$ is, at its cleanest, the cross-entropy of the model against the data.

See also occams-razor-as-a-prior — MDL is Occam’s razor, made precise.