Kolmogorov complexity: Cosmos — All that is, or was, or ever will be

Kolmogorov complexity

cosmos 28th November 2017 at 11:47pm

_{aka algorithmic complexity, although that term may refer to some generalizations of Kolmogorov complexity too, I think}

One of the main kinds of Descriptional complexity, based on the minimum size of a program (interpreted by a Turing machine) that produces (describes) the object. The Kolmogorov complexity of an object is the minimal number of bits required to effectively describe the object.

Algorithmic information theory is an alternative formalization for the notion of information to that of standard Shannon information theory. It defines information as the size of a compressed representation of an object. In principle, a compressed representation depends on the function one uses to decode the representation. However, it turns out that if one uses a universal Turing machine (UTM) as a decoder, where representations correspond to input programs which produces the object of interest, then these representations are additively optimal. This means that if we take the shortest such program for the UTM, any other computable decoder function can only improve the compression by at most an additive constant which is independent of the program [8]. The size of such shortest program is called the Kolmogorov complexity of the object, and represents how much information it is needed to describe the object.

Kolmogorov complexity is central in Algorithmic information theory. See more in that page

Math 574, Lesson 4-3: Kolmogorov Complexity other video intro

This is based on describing the information content of a discrete object such as a binary string $x$ in terms of the length of the shortest program that generates $x$ on universal Turing machine (UTM). This measure is called the Kolmogorov-Chaitin complexity or simply Kolmogorov complexity $K(x)$ of $x$ .

AIT differs fundamentally from Shannon information theory because the latter is fundamentally a theory about distributions, whereas the former is a theory about the information content of individual objects. Descriptional complexity also differs simplcitly with the notions of complexity in Complex systems.

Lecture notes on descriptional complexity and randomness

–> Calculating Kolmogorov Complexity from the Output Frequency Distributions of Small Turing Machines See Coding theorem method

Invariance theorem

See Algorithmic information theory

Based on defining an optimal compressor, which turns out to be realizable by a "reasonable" Universal Turing machine, if we restrict ourselves to computable functions (if we consider all functions, there are no optimal compressors). This is a compressor that is able to describe any string by a description that is never longer than that produced by other compressor by more than an additive constant.

This defines a hierarchy. The bottom of the hierarchy (optimal compressors) is not unique, as there are an infinite number of additively equivalent UTMs. Therefore, KC is well defined, but only up to an additive constant! This is the content of the invariance theorem.

This is shown by showing that any UTM can simulate any other UTM, by only adding a fixed number of bits (analogous to a Compiler). However, this number of bits itself depends on the particular way of enumerating Turing machines used, but it can be shown that the invariance theorem still holds as long as the enumerations are computable (analogous to the notion of acceptable numberings in de Mises definition of Randomness).

Two-part codes

One can equivalently define KCs as the minimum description of the form "description of Turing machine"+"program such that when read by that Turing machine it produces x". It can be shown that any minimal program can be written in that way. This is just like a reinterpretation of KC..

Bounds_

$C(x) \leq l(x) + c$ and $C(x|y) \leq C(x) + c$

$C(x,y) \leq C(x) + C(y) + 2\log{(\min{(C(x),C(y))})}+O(1)$ , but also $C(x,y) \geq C(x) + C(y) + \log{n} -c$ for some $c$

$C(x) < C(y) + C(x|y) + O(log(C(x))$ (chain rule)

$C(x^*|x) \leq \log{l(x)}$ , where $x^*$ is the first shortest program producing $x$ . Note that even though $x\rightarrow x^*$ is a function, it is not a computable function, and so it actually requires extra information to computably produce $x^*$ given $x$ ! Nice

$C(x) \leq \log{|A|} + c$ , when $x\in A$ , and $A$ is a computable set.

There are at least $m(1-2^{-c})+1$ elements in $x\in A$ with $C(X) \geq \log{m} -c$ , where $m=|A|$ .

C is nonmonotonic on prefixes, that is we can have C(y) > C(x), where y is a proper prefix of x. This could be because the length of x is some special number, while the length of y is a very complex number.. the complexity of a part can turn out to be bigger than the complexity of the whole. $C(x|l(x))$ turns out to also be nonmonotonic on prefixes.

Computability of KC

KC is not computable, but it is semi lower computable, meaning it can be approximated from above, for example, via lossless compression algorithms.. See Computability theory, and "Li, M.; Vitanyi, P. An Introduction to Kolmogorov Complexity and Its Application", for formal proofs.

Computable approximations to KC

Levin complexity

Data compression rate

Paucity theorems

Simple strings (paucity) are rare among all possible strings

The frequent paucity of trivial strings

Kolmogorov complexity is not subadditive, meaning, that there are x and y such that C(x,y) > C(x)+C(y)

A Computable Measure of Algorithmic Probability by Finite Approximations

See MMathPhys oral presentation

Deficiencies of KC

from here

Kolmogorov Complexity and Solovay Functions

On the Gaussianity of Kolmogorov Complexity of Mixing Sequences

http://www.hutter1.net/kolmold.txt

http://www.hutter1.net/ait.htm

An use in AI here