aka Universal prediction
Recent thesis anlyzing Solomonoff prediction, and criticizing it. See also paper that Ard showed me about Solomonoff being equivalent to just the assumption of computability (see email I sent him)
Nice paper about the theory of Universal Prediction using computable measures of complexity! (use Kami to see comments, saved on Google Drive)
It is based on performing Levin search over candidate answers, to preferentially search simple ones. However, we should combine this with probabilistic heuristics that are learned from past experience (an example of Incremental learning).
Developed by Ray Solomonoff
See also Universal probability, Universal gambling, Induction
Solomonoff's theory of inductive inference
Jeff Eldred Introduces Solomonoff Induction: Introduction about Induction, and about Baye's theorem. He then goes into Kolmogorov complexity, and its relation to Solomonoff induction, and Turing machines
See Ray Solomonoff paper read by Marcus Hutter - Algorithmic Probability, Heuristic Programming & AGI It is based on:
–>See section 5.2 in Li&Vitanyi's book on Kolmogorov complexity.
Description (vid). There are apparently problems with self-referentiality, when the true probability distribution governing the bit sequence, depends on the induction machine itself. There are problems with reinforcement learning.
More notes:
Bayes, Probability theory, Human learning, have similitudes.
"Information packing problem", Shannon. See Data compression
Carnap's probability theory. Korzybski-like way of avoiding the problem of verifying probabilistic theories?
Huffman, Minksy, McCarthy.
An inductive inference machine (paper where he introduced the idea).
See also Grammar learning.
Main idea: Willis' computable schemes (Resource bounded algorithmic probability)
In 1968 I was asked to review "Computational Complexity and Probability Constructions a paper by David Willis. It was the first substantive response I'd seen to my 1964 paper giving the four models. I found his paper to be excellent Willis avoided the "halting problem" by defining computationally limited Turing machines that had no halting problems. From these machines, he was able to define probabilities in an exact manner. By raising the computational limits on his machines, he approached the behavior of universal machines.
In addition to its value as a rigorous treatment of a subject that I had treated relatively informally, I was able to use Willis' results to prove that these induction methods satisfied my "correctness" criterion. The methods converged surprisingly rapidly to the correct probability values
"While Willis' work seemed closer to practical realization, Levin's was a model of mathematical elegance. "
Algorithm
These kinds of approximations are used to defin practical probability. Our particular algorithm/implementation will depend on:
These methods are optimal approximations. ("By getting as large a value of the sum as we can, we get as close as possible to Algorithmic Probability in the allotted time.")
An example is "The VH method", by Van Herdeen, based on the description of Boolean functions that predict binary strings. This is like just taking the first term in the sum that defines algorithmic probability. Thus it is a rather crude approximation.
Other methods:
Minimal message length
Minimal description length. Stochastic complexity.
More.
See Randomness, Complexity
Prediction by Compression
It is not a priori clear that the shortest program dominates in all cases—and in fact it does not. However, we show that in the overwhelming majority of cases the shortest program dominates sufficiently to validate the approach that uses only shortest programs for prediction. (sec 5.2.4 in Li&Vitanyi's book)