Simplicity and learning

cosmos 27th April 2018 at 6:16pm
Learning Simplicity

See Learning theory, Order and Simplicity bias. The simplicity and structure in signals in the real-world is often seized to make the learning problem easier to solve. Can be formalized via Learning theory, PAC-Bayesian learning

See my paper on Simplicity bias in the parameter-function map

Applications in Inverse problems. For instance, see Convex optimization heuristics for linear inverse problems and Linear inverse problem

Applications in Compressed sensing

Simplicity and neural networks

See Neural network theory, Why does deep and cheap learning work so well?, Deep learning theory, Generalization in deep learning

No Free Lunch versus Occam's Razor in Supervised Learning

Nature often results in functions that are polynomials with several simplifying features:

1. Low polynomial order

For reasons that are still not fully understood, our uni-verse can be accurately described by polynomial Hamiltonians of low order dd.

The Central limit theorem gives rise to Probability distributions corresponding to quadratic Hamiltonians (see def in Neural network theory). Similar results regarding maximum entropy distributions are also mentioned in the paper. Several common operations on image and sound are linear and thus order 1 polynomials on the input.

2. Locality

locality in a lattice manifests itself by allowing only nearest-neighbor interaction. In other words, almost all coeficients in the polynomial are forced to vanish, and the total number of non-zero coeficients grows only linearly with nn.

This can be stated more generally and precisely using the Markov network formalism

3. Symmetry

Whenever the Hamiltonian obeys some symmetry (is in-variant under some transformation), the number of independent parameters required to describe it is further reduced.


deep learning

"If f is a truly random function then it is highly unlikely that anyone will ever conceive of its existence and would want to learn it." ~ Li&Vitanyi's book

See also comments in this paper