Theory of losy data compression
The description of an arbitrary real number requires an infinite number of bits, so a finite representation of a continuous random variable can never be perfect. How well can we do? To frame the question appropri- ately, it is necessary to define the “goodness” of a representation of a source. This is accomplished by defining a distortion measure which is a measure of distance between the random variable and its representation. The basic problem in rate distortion theory can then be stated as follows: Given a source distribution and a distortion measure, what is the minimum expected distortion achievable at a particular rate?
Consider the problem of representing a single sample from the source. Let the random variable be represented be X and let the representation of X be denoted as X̂(X). If we are given R bits to represent X, the function X̂ can take on 2 R values. The problem is to find the optimum set of values for X̂ (called the Reconstruction points or code points) and the regions that are associated with each value X̂. Optimum means minimizing the expected distortion, with resepect to some Distortion function, which for real values can be the Squared error
The Lloyd algorithm will converge to a Local minimum of the distortion.
Instead of quantizing a single random variable, let us assume that we are given a set of n i.i.d. random variables drawn according to a Gaussian distribution. These random variables are to be represented using nR bits. Since the source is i.i.d., the symbols are independent, and it may appear that the representation of each element is an independent problem to be treated separately. But this is not true, as the results on rate distortion theory will show. We will represent the entire sequence by a single index taking values. This treatment of entire sequences at once achieves a lower distortion for the same rate than independent quantization of the individual samples.
By combining rate-distortion theory with minimum sufficient statistic theory, we get the Information bottleneck principle
https://www.wikiwand.com/en/Rate%E2%80%93distortion_theory
has connections to unsupervised learning (see Autoencoders for instance