See Learning theory, Order and Simplicity bias. The simplicity and structure in signals in the real-world is often seized to make the learning problem easier to solve. Can be formalized via Learning theory, PAC-Bayesian learning
See my paper on Simplicity bias in the parameter-function map
Applications in Inverse problems. For instance, see Convex optimization heuristics for linear inverse problems and Linear inverse problem
Applications in Compressed sensing
See Neural network theory, Why does deep and cheap learning work so well?, Deep learning theory, Generalization in deep learning
No Free Lunch versus Occam's Razor in Supervised Learning
Nature often results in functions that are polynomials with several simplifying features:
1. Low polynomial order
For reasons that are still not fully understood, our uni-verse can be accurately described by polynomial Hamiltonians of low order .
The Central limit theorem gives rise to Probability distributions corresponding to quadratic Hamiltonians (see def in Neural network theory). Similar results regarding maximum entropy distributions are also mentioned in the paper. Several common operations on image and sound are linear and thus order 1 polynomials on the input.
2. Locality
locality in a lattice manifests itself by allowing only nearest-neighbor interaction. In other words, almost all coeficients in the polynomial are forced to vanish, and the total number of non-zero coeficients grows only linearly with .
This can be stated more generally and precisely using the Markov network formalism
3. Symmetry
Whenever the Hamiltonian obeys some symmetry (is in-variant under some transformation), the number of independent parameters required to describe it is further reduced.
"If f is a truly random function then it is highly unlikely that anyone will ever conceive of its existence and would want to learn it." ~ Li&Vitanyi's book
See also comments in this paper