The Kolmogorov complexity of a string is the length of the shortest program that outputs it. It is a formalisation of the idea that "aaaaaaaaaa" is less complex than "48ka2nxqpw", even though both are ten characters long.
Call it . Then:
where is a fixed universal Turing machine and is the length in bits of program . The choice of changes by at most a constant — the invariance theorem, and it is what makes the definition useful despite appearing arbitrary.
why this matters
- is uncomputable. You cannot write a function that returns it for arbitrary . (Proof: halting-problem reduction.)
- It gives a rigorous definition of randomness — a string is random iff .
- Almost every result in algorithmic information theory descends from it.
See also minimum-description-length for the practical approximation, and occams-razor-as-a-prior for the philosophical consequence.
open in my head
I still don’t have a clean intuition for why is uncomputable while many weaker compressions (LZ77, gzip) are tractable. The gap feels philosophically important but I can’t articulate it yet.