Universal source coding

cosmos 22nd August 2018 at 4:12pm
Data compression

Methods for data compression, which asymptotically achieve optimality without knowing the Information source distribution a priori, and instead work for general classes of distributions asymptotically (hence the name universal). They are almost always Online algorithms, because in a non-online settings we may be able to just compute the source distribution before compressing.

Lempel-Ziv algorithms achieve the entropy limit asymptotically for Stationary ergodic process, as well as for Finite-state sources (finite stationary Markov chains)

See chapter 13 of Cover&Thomas

Universal source coding of binary sources (see ...)

book


A good data source/data compression code has little Code redundancy (ideally 00 asymptotically, as the message length goes to infinity). Given a family of source distributions which the information source may follow in a given situation {pθ}\{p_\theta\}, we define the Minimax redundancy as

R=minqmaxpθR(pθ,q)=minqmaxpθD(pθq)R^* = \min_q \max_{p_\theta} R(p_\theta,q) = \min_q \max_{p_\theta} D(p_\theta || q)

where RR is the Code redundancy. This can be interpreted as a Game

This minimax redundancy is achieved by a distribution qq that is at the "center" of the information ball containing the distributions pθp_\theta, that is, the distribution qq whose maximum distance from any of the distributions pθp_\theta is minimized.

One can construct a Communication channel {θ, p θ (x), X } with the rows of the transition matrix equal to the different pθp_\theta’s, the possible distributions of the source. We will show that the minimax redundancy RR^* is equal to the capacity of this channel, and the corresponding optimal coding distribution qq^* is the output distribution of this channel induced by the capacity-achieving input distribution! (Theorem by Gallager and Ryabko).


"In the original paper of Ziv and Lempel [603], the authors described the basic LZ77 algorithm and proved that it compressed any string as well as any finite-state compressor acting on that string."

So LZ is an optimal compressor over the set of finite state compressors, just like UTMs are over the set of all computable compressors!

Upper Bounds on the Probability of Sequences Emitted by Finite-State Sources and on the Redundancy of the Lempel- Ziv Algorithm

Universal prediction