Minimum description length (MDL) is a practical approximation of kolmogorov-complexity, trading exactness for computability. The principle: the best model for some data is the one that, together with the data encoded using the model, gives the shortest total description.
Written more carefully, for model and data :
where is the code length in bits. Minimising this over models trades off fit ( small) against model complexity ( small). It is equivalent, asymptotically, to Bayesian model selection with a universal prior.
why it’s useful
Unlike Kolmogorov complexity, MDL actually picks a model family up front — say, polynomials up to degree — and then optimises. This makes it tractable. The two-part code () maps cleanly onto common ideas:
- parameters vs. residuals in regression
- depth vs. training error in decision trees
- complexity penalty in AIC/BIC, which are first-order approximations
It also connects to Shannon entropy: is, at its cleanest, the cross-entropy of the model against the data.
See also occams-razor-as-a-prior — MDL is Occam’s razor, made precise.