A good review by Lenka Zdeborová, Florent Krzakala: Statistical physics of inference: Thresholds and algorithms
The thermodynamics of prediction
A correspondence between thermodynamics and inference (hypothesis annotations)
See lectures on statistical physics of inference:
From information theory to learning via Statistical Physics: Introduction: by Florent Krzakala . Lectures from Beg Rohu 2018 school
Phase Transitions in the Coloring of Random Graphs, See Graph coloring.
See also book on phase transitions on machine learning.
See also phase transition in the inference problem in this video From information theory to learning via Statistical Physics by Florent Krzakala – related to magnetic Phase transition!
Phase transition describes transition from region where a problem is solvable, to a region where it is not solvable!
Nice Gauge transformation of the hamiltonian, makes it into a ferromagnetic phase transition calculation
And book on "mathematics of generalization"
Solvable Model of Unsupervised Feature Learning
Statistical physics of learning from examples: a brief introduction
Rigorous Learning Curve Bounds from Statistical Mechanics, see Learning curve
Learning with Boolean Threshold Functions, a Statistical Physics Perspective - Raemi Monasson
Perceptron – Connection of VC dimension and capacity (see here) Capacity refers basically to probability that a random input/output set is realizable by our hypothesis class (See here for case with hyperplanes). For more complicated architectures is harder to calculate this...
Calculating the alpha critical () for +/-1 weight vectors (that is number of patterns (over dimensionality of system) at which the probability of there existing a solution drops to zero). To do this he has to calculate the probability of a certain number of solutions existing, and assume there is a limit with a large peak taking almost all probability (Large deviation theory, see here). This peak depends on alpha. When the peak > 1 then with high probability there is >1 solution. When the peak reaches zero, there is no solution with high probability. This is the point defining . He starts by defining Number of solutions for +/-1 weight vectors. Then, writting an expression for the probability of the number of solutions being in a certain interval . We can find the peak of this quantity using Replica method, assume Replica symmetry. If the number of solutions is expressed as , then peak is calculated to occur at this omega (written as a maximum over a complicated expression, depending on typical overlap of solutions). This gives us alpha critical Can extend to case with finite temperature (where we allow approximate solutions, with probability weighted by how good they are).
Adaptation of methods for the Tempotron
See here for how one can use RG to show that fractal-like Committee machines have universal Learning curves.
Statistical mechanics of learning – Universality of optimal learning curve using RG! (Generalization Error in a Self-Similar Committee Machine)
Nonequilibrium analysis of simple neural nets
The large deviations of the whitening process in random constraint satisfaction problems
The statistical mechanics of learning a rule
Unreasonable Effectiveness of Learning Neural Nets: Accessible States and Robust Ensembles
Here we discuss how this phenomenon emerges in learning in large-scale neural networks with low precision synaptic weights. We further show how it is connected to a novel out-of-equilibrium statistical physics measure that suppresses the confounding role of exponentially many deep and isolated configurations (local minima of the error function) and also amplifies the statistical weight of rare but extremely dense regions of minima. We call this measure the Robust Ensemble (RE). Moreover, we show that the RE allows us to derive novel and exceptionally effective algorithms. One of these algorithms is closely related to a recently proposed stochastic learning protocol used in complex deep artificial neural networks [8], implying that the underlying geometrical structure of the RE may provide an explanation for its effectiveness.s effectiveness.