The average length of a Code is bounded below by the entropy of the random variable that models your data.
Source coding theorem
The average length of a Code is bounded below by the Entropy of the distribution over source letters |
Note: the code is a code for letters in the above theorem
This can be obtained using Calculus (and Lagrangian multipliers) to minimize the expected word length for a prefix code, while satisfying the Kraft inequality. What if the code is not prefix-free?
This gives non-integer lengths. But one can show that the actual optimal lengths will be close to this in a particular sense: Find the D-adic distribution that is closest (in the relative entropy sense) to the distribution of X. This distribution provides the set of code- word lengths. (see page 112 of Thomas&Clover book)
An optimal code in this sense is the Huffman code. The Shannon-Fano code is close to optimal but not optimal.
When we don't know the distribution of the source, we can use Universal source coding
See Data compression
See this video
This is proved by constructing a code given by indexing the elements of the Typical set (for which we need bits, for length and entropy per letter ), and another indexing for the atypical set. See 3.23 on ElsofInfoTheory.
If , then can be.