Algorithmic information theory: Cosmos — All that is, or was, or ever will be

Algorithmic information theory

cosmos 13th July 2018 at 10:30pm

Algorithmic information theory is an alternative formalization for the notion of information to that of standard Shannon information theory. It defines information as the size of a compressed representation of an object. In principle, a compressed representation depends on the function one uses to decode the representation. However, it turns out that if one uses a universal Turing machine (UTM) as a decoder, where representations correspond to input programs which produces the object of interest, then these representations are additively optimal. This means that if we take the shortest such program for the UTM, any other computable decoder function can only improve the compression by at most an additive constant which is independent of the program [8]. The size of such shortest program is called the Kolmogorov complexity of the object, and represents how much information it is needed to describe the object.

See Descriptional complexity and MMathPhys oral presentation. See also Information theory, Theory of computation, Complexity theory, and Computational complexity

Good lecture notes for AIT: http://www.cse.iitk.ac.in/users/satyadev/a10/a10.html – http://algoinfo4u.com/AlgoInfo4u.pdf

Algorithmic probability and friends

good review: ALGORITHMIC INFORMATION THEORY: REVIEW FOR PHYSICISTS AND NATURAL SCIENTISTS

New book: http://bookstore.ams.org/surv-220?utm_content=buffer87b56&utm_medium=social&utm_source=facebook.com&utm_campaign=buffer

http://algoinfo4u.com/index.html

book

Introduction to Kolmogorov complexity and its applicatoins

Kolmogorov complexity

Plain Kolmogorov Complexity

See Elements of information theory by Cover and Thomas (chap 14)

Conditional Kolmogorov complexity

$<x, y>$ is the pairing function (see Computability theory). The conditional Kolmogorov complexity is often defined as in Def. 2.0.1, but with $y$ is $l(x)$ , the length of $x$ .

Universality of Kolmogorov complexity

_{aka Invariance theorem}

For sufficiently long x, the length of this simulation program can be neglected, and we can discuss Kolmogorov complexity without talking about the constants.

Note in the book on info theory, they use the ceiling function for the {number of bits in a binary representation of a number}; however, as mentioned here that fails for multiples of $2$ , so we need to use $\lfloor log(n) \rfloor +1$

Bounds

Upper bound on Kolmogorov complexity

where $log^*(x) = \log(x) + \log(\log(x)) + \log(\log(\log(x))) + ...$

Lower bounds on Kolmogorov complexity

There are very few sequences with low complexity

Relations to entropy

Kraft inequality

Relation to entropy

as $n \rightarrow \infty$ . See proof in the book (uses Kraft's inequality, Jensen's inequality, and the concavity of the entropy) Therefore the average Kolmogorov complexity of the string approaches the entropy of the random variable from which the letters of the string are sampled. The compressibility achieved by the computer goes to the entropy limit.

Theorem 14.4.3 There are an infinite number of integers $n$ such that $K(n) > \log{n}$ .

Algorithmic randomness and incompressible sequences

Theorem 14.5.1 Let $X_1, X_2, ..., X_n$ be drawn according to a Bernoulli $(\frac{1}{2})$ process. Then

$P(K(X_1 X_2 ... X_n | n) < n-k) < 2^{-k}$

For example, the fraction of sequences of length n that have complexity less than n − 5 is less than 1/32. This motivates the following definition.

Definitions of algorithmic randomness, incompressibility.

Strong law of large numbers for incompressible sequences

In general, we can show that if a sequence is incompressible, it will satisfy all computable statistical tests for randomness. (Otherwise, identification of the test that x fails will reduce the descriptive complexity of x, yielding a contradiction.) In this sense, the algorithmic test for randomness is the ultimate test, including within it all other computable tests for randomness.

We now remove the expectation from Theorem 14.3.1

Universal probability

Probability distribution of outputs of a Turing machine, when fed random inputs. Apparently, the distribution is rather robust to the probability distribution of inputs, thus it being called "universal"

The halting problem noncomputability of Kolmogorov complexity

Epimenides liar pradox Godels incompleteness theorem Halting problem

Related: Berry's paradox and Bechenbach's paradox

Chaitin's $\Omega$

Definition

Properties:

1. $\Omega$ is noncomputable

2. $\Omega$ is a "philosopher's stone", or an oracle. Knowledge of $\Omega$ to $n$ bits can be used to prove any theorem for which {a proof expressible in less than $n$ bits exists}.

3. $\Omega$ is algorithmically random.

Theorem 14.8.1. $\Omega$ cannot be compressed by more than a constant; that is, there exits a constant $c$ such that

$K(\omega_1\omega_2...\omega_n) \geq n-c$ for all $n$

Universal gambling

the universal gambling scheme on a random sequence does asymptotically as well as a scheme that uses prior knowledge of the true distribution!

Universal prediction

Occam's razor

......

Coding theorem

Proof involves an extension of the {tree construction used for Shannon-Fano-Elias codes for computable probability distributions} to the uncomputable universal probability distributions.

As stated in the proof in the InfoTheory book, "However, there is no effective procedure to find the lowest depth node corresponding to x". This means that the coding they use in the proof is incomputable. However, they show it exist, and that it can be decoded in finite time, giving a description of the string.