Code redundancy

cosmos 26th September 2017 at 1:10pm

In Data compression, particularly Universal source coding, we define the redundancy of a code with Codeword lengths l(x)l(x), and implied probability q(x)=2l(x)q(x)=2^{-l(x)} (see Source coding theorem), as the difference between the expected length of the code (under the true Information source XX distribution pp) and the lower limit for the expected length:

R(p,q)=Ep[l(X)]Ep[log1p(X)]R(p,q) = E_p[l(X)] - E_p\left [ \log{\frac{1}{p(X)}}\right]

= xp(x)(l(x)log1p(x))\sum_x p(x) \left ( l(x) - \log{\frac{1}{p(x)}}\right)

=xp(x)(log1q(x)log1p(x))=\sum_x p(x) \left (\log{\frac{1}{q(x)}} - \log{\frac{1}{p(x)}}\right)

=xp(x)logp(x)q(x) = \sum_x p(x) \log{\frac{p(x)}{q(x)}}

=D(pq)=D(p||q)

where q(x)=2l(x)q(x) = 2^{-l(x)} is the distribution that corresponds to the codeword lengths l(X)l(X), and D()D(\cdot || \cdot) is the Relative entropy