See draft paper –> published: Input–output maps are strongly biased towards simple outputs
See MMathPhys oral presentation
Coding theorem connects probability and complexity
AIT differs fundamentally from Shannon information theory because the latter is fundamentally a theory about distributions, whereas the former is a theory about the information content of individual objects (see Descriptional complexity).
If one assumes that the probability of generating a binary input string of length l issimply (which is true for prefix codes, see Appendix A) then the most likely way to obtain output by random sampling of inputs is with the shortest program that generates it, a string of length .
Direct application of these results fromAIT to many practical systems in science or engineering suf-fers from a number of well known problems:
The way these problems are tackled is described in the next sections.
Coding theorem for computable functions. We begin with a weaker form of the coding theorem, applicable to real world (computable) functions
Eq. (2)
where is the complexity of output , given the map and the value which is
(see M. Li and P.M.B. Vitanyi. An introduction to Kolmogorov complexity and its applications. Springer-Verlag New York Inc, 2008., and Lecture notes on descriptional complexity and randomness).
We provide a derivation of equation (2) in Appendix B, using standard results from AIT such as: The complexity of a whole set is often much less than the complexity of individual members of the set (9). Informally , can be viewed as the length of computer code required to specify , given the function and value are already pre-programmed in to the computer. Note that equation (2) is just an upper bound. In contrast to the the full coding theorem of equation (1), there is no lower bound.
We mainly consider maps that comply witha few simple restrictions:
If instead the map is allowed to contain arbitrary amounts of information, then the map could assign arbitrary probabilties to the outputs, and any coding theorem-like behaviour would be lost. We discuss this fixed complexity condition further in Appendix C.
as we show in Appendix E , it turns out that a reasonably broad range of complexities will follow under quite general conditions for fixed complexity maps.
They use the above conditions to argue that .
Approximately computing . although uncomputable has been approximated using standard compression algorithms. is used to denote some real-world (computable) approximation to .
Importance of terms. Experimental results on apply the coding theorem to short strings suggest that the terms are not very important.
where the constants , and depend on the mapping, but not on .
We call this upper bound of the probability simplicity bias: High probability outputs must be 'simple', complex outputs must have exponentially lower probabilities. In contrast to the full coding theorem, the lack of a lower bound means that simple outputs may also have low probabilities.
They off estimates for and in appendix D. In Appendix B, they also argue that: the upper bound of equation (3) should be tight for most inputs, but weak for many outputs.
See them!
Discrete RNA sequence to structure mapping
Coarse-grained ordinary differential equation
Coarse-grained stochastic partial differential equation. Black Scholes equation
Polynomial curves
Random matrix – bias but not simplicity bias
Random walk map
Logistic map (see Nonlinear maps)
Predicting which of two outputs has higher probability
Connection to Chomsky hierarchy (see Formal language), Sloppy systems..
Appendix A: AIT
Appendix B: Upper and lower bound for computable maps
Upper bound: following derivation using Shannon-Fano code as in InfoTheory book.
Lower bound: Not sure. Ask!, or read! Page 12.
Also many outputs must have probability below their upper bounds.
Appendix C: Fixed complexity map
Appendix D: Making predictions for P(x) in computable maps
Appendix E: Estimating the range of .
Arguments based on bounding complexity given the description: map + index of output. This gives upper bounds to min and max complexities to be , and (everything up to additive constant, ).
For the max, we also need a lower bound, and this is given by the well known fact in AIT that if one has different strings, not all of them can have complexity lower than , as there are not enough such descriptions. In fact, most of the strings need to have a complexity of .
Appendix F: Approximations to .
Appendix G: Simplicity bias and system size
Appendix H: On the intuitive connection of probability and complexity
Appendix I: Simplicity bias in the logbinomial distribution
Appendix J: Predicting the number of outputs, . By fitting and estimating from the known details of the system.
Appendix K: Further examples and figures
Continuous systems are sampled and discretized to create the output.
L-systems, Circadian rhythm
Cell cycle
Feed forward network. Sample networks, measure complexity of given network. by entropy of the distribution of outputs.
Logic gate
Appendix L: Histograms of complexity