Universal inductive inference

cosmos 22nd August 2018 at 4:12pm
Universal AI

aka Universal prediction

Recent thesis anlyzing Solomonoff prediction, and criticizing it. See also paper that Ard showed me about Solomonoff being equivalent to just the assumption of computability (see email I sent him)

Nice paper about the theory of Universal Prediction using computable measures of complexity! (use Kami to see comments, saved on Google Drive)

It is based on performing Levin search over candidate answers, to preferentially search simple ones. However, we should combine this with probabilistic heuristics that are learned from past experience (an example of Incremental learning).

Developed by Ray Solomonoff

See also Universal probability, Universal gambling, Induction

Solomonoff's theory of inductive inference

Jeff Eldred Introduces Solomonoff Induction: Introduction about Induction, and about Baye's theorem. He then goes into Kolmogorov complexity, and its relation to Solomonoff induction, and Turing machines

Principles

See Ray Solomonoff paper read by Marcus Hutter - Algorithmic Probability, Heuristic Programming & AGI It is based on:

–>See section 5.2 in Li&Vitanyi's book on Kolmogorov complexity.

Convergence theorem

Description (vid). There are apparently problems with self-referentiality, when the true probability distribution governing the bit sequence, depends on the induction machine itself. There are problems with reinforcement learning.


More notes:

Bayes, Probability theory, Human learning, have similitudes.

"Information packing problem", Shannon. See Data compression

Carnap's probability theory. Korzybski-like way of avoiding the problem of verifying probabilistic theories?

Huffman, Minksy, McCarthy.

An inductive inference machine (paper where he introduced the idea).

See also Grammar learning.

Practical induction

Universal prediction

Optimal approxiamations

Main idea: Willis' computable schemes (Resource bounded algorithmic probability)

In 1968 I was asked to review "Computational Complexity and Probability Constructions a paper by David Willis. It was the first substantive response I'd seen to my 1964 paper giving the four models. I found his paper to be excellent Willis avoided the "halting problem" by defining computationally limited Turing machines that had no halting problems. From these machines, he was able to define probabilities in an exact manner. By raising the computational limits on his machines, he approached the behavior of universal machines.

In addition to its value as a rigorous treatment of a subject that I had treated relatively informally, I was able to use Willis' results to prove that these induction methods satisfied my "correctness" criterion. The methods converged surprisingly rapidly to the correct probability values

"While Willis' work seemed closer to practical realization, Levin's was a model of mathematical elegance. "

Algorithm

These kinds of approximations are used to defin practical probability. Our particular algorithm/implementation will depend on:

These methods are optimal approximations. ("By getting as large a value of the sum as we can, we get as close as possible to Algorithmic Probability in the allotted time.")

Suboptimal approximations

An example is "The VH method", by Van Herdeen, based on the description of Boolean functions that predict binary strings. This is like just taking the first term in the sum that defines algorithmic probability. Thus it is a rather crude approximation.

Other methods:

Minimal message length

Minimal description length. Stochastic complexity.

More.

Computability vs completeness

Algorithmic probability (i.e. universal inductive inference) as the "gold standard" to evaluate the utility of induction methods! (see also Universal AI!)

See Randomness, Complexity

Prediction by Compression

It is not a priori clear that the shortest program dominates in all cases—and in fact it does not. However, we show that in the overwhelming majority of cases the shortest program dominates sufficiently to validate the approach that uses only shortest programs for prediction. (sec 5.2.4 in Li&Vitanyi's book)