Statistical physics and inference

cosmos 10th December 2019 at 3:26pm
Information, computation and physics Statistical inference

A good review by Lenka Zdeborová, Florent Krzakala: Statistical physics of inference: Thresholds and algorithms

The thermodynamics of prediction

A correspondence between thermodynamics and inference (hypothesis annotations)

See lectures on statistical physics of inference:

From information theory to learning via Statistical Physics: Introduction: by Florent Krzakala . Lectures from Beg Rohu 2018 school

Phase transitions in machine learning

Phase Transitions in the Coloring of Random Graphs, See Graph coloring.

See also book on phase transitions on machine learning.

See also phase transition in the inference problem in this video From information theory to learning via Statistical Physics by Florent Krzakala – related to magnetic Phase transition!

Phase transition describes transition from region where a problem is solvable, to a region where it is not solvable!

Nice Gauge transformation of the hamiltonian, makes it into a ferromagnetic phase transition calculation

And book on "mathematics of generalization"

Solvable Model of Unsupervised Feature Learning

Learning to generalize

Statistical physics of learning from examples: a brief introduction

Rigorous Learning Curve Bounds from Statistical Mechanics, see Learning curve


Learning with Boolean Threshold Functions, a Statistical Physics Perspective - Raemi Monasson

PerceptronConnection of VC dimension and capacity (see here) Capacity refers basically to probability that a random input/output set is realizable by our hypothesis class (See here for case with hyperplanes). For more complicated architectures is harder to calculate this...

Calculating the alpha critical (αc\alpha_c) for +/-1 weight vectors (that is number of patterns (over dimensionality of system) at which the probability of there existing a solution drops to zero). To do this he has to calculate the probability of a certain number of solutions existing, and assume there is a limit with a large peak taking almost all probability (Large deviation theory, see here). This peak depends on alpha. When the peak > 1 then with high probability there is >1 solution. When the peak reaches zero, there is no solution with high probability. This is the point defining αc\alpha_c. He starts by defining Number of solutions for +/-1 weight vectors. Then, writting an expression for the probability of the number of solutions being in a certain interval . We can find the peak of this quantity using Replica method, assume Replica symmetry. If the number of solutions is expressed as Nsol=2Nω\mathcal{N}_{sol}=2^{N\omega}, then peak is calculated to occur at this omega (written as a maximum over a complicated expression, depending on typical overlap of solutions). This gives us alpha critical Can extend to case with finite temperature (where we allow approximate solutions, with probability weighted by how good they are).

Adaptation of methods for the Tempotron


Renormalization group

See here for how one can use RG to show that fractal-like Committee machines have universal Learning curves.

Statistical mechanics of learningUniversality of optimal learning curve using RG! (Generalization Error in a Self-Similar Committee Machine)

Statistical mechanics of neural networks

Nonequilibrium analysis of simple neural nets


The large deviations of the whitening process in random constraint satisfaction problems

See Non-convex optimization

The statistical mechanics of learning a rule

Unreasonable Effectiveness of Learning Neural Nets: Accessible States and Robust Ensembles

Here we discuss how this phenomenon emerges in learning in large-scale neural networks with low precision synaptic weights. We further show how it is connected to a novel out-of-equilibrium statistical physics measure that suppresses the confounding role of exponentially many deep and isolated configurations (local minima of the error function) and also amplifies the statistical weight of rare but extremely dense regions of minima. We call this measure the Robust Ensemble (RE). Moreover, we show that the RE allows us to derive novel and exceptionally effective algorithms. One of these algorithms is closely related to a recently proposed stochastic learning protocol used in complex deep artificial neural networks [8], implying that the underlying geometrical structure of the RE may provide an explanation for its effectiveness.s effectiveness.