Statistical mechanics of neural networks: Cosmos — All that is, or was, or ever will be

Statistical mechanics of neural networks

cosmos 30th November 2018 at 2:18am

Thouless-Anderson-Palmer equations for neural networks

Nonequilibrium analysis of simple neural nets

Sompolinsky II Beg Rohu 2018

High-dimensional dynamics of generalization error in neural networks Generalization error diverges when number of examples equals number of dimensions, for target function with noise

To understand phenomenon described by Saxe and in the video at 43:00, we can think of this: Low eigenvalues in XX^T correspond to directions with little variations in the input. However, by the random fluctuation eta, the output could have an O(1) variation, even for arbitrarily small input variation, which requires a large weight to fit, and produces large generalization error.

for $\alpha<1$ the probability of this directions in input space with low variation decreases, as we get less directions overall with points I think (directions with no points/variation are ignored for the algorithm which projects weight into input subspace, and these are the 0 eigenvalue parts of the Marchenko-Pastur distr)

Neural networks and Spin glasses

See Deep learning theory and Hopfield network

See more at Neural Networks: An Introduction [2 ed.] by Muller et al.

The ferromagnetic case corresponds to a neural network that has stored a single pattern, and has no Frustration. The network which has been loaded with a large number of randomly composed patterns resembles a spin glass.

The Loss Surfaces of Multilayer Networks

How glassy are neural networks?

Spin-glass models of neural networks

Temperature based RBMs

Statistical Mechanics of Neural Networks near Saturation

Statistical mechanics of learning

Learning to generalize

Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior

Mean field theory of neural networks

A Correspondence Between Random Neural Networks and Statistical Field Theory

Deep Information Propagation

Exponential expressivity in deep neural networks through transient chaos

Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice

There is a hierarchical fragmentation of the state space as more local minima appear in the free energy as one decreases the temperature in a Spin glass

This seems related qualitatively to how the sublevel sets become more irregular as one decreases the energy level for neural network loss landscapes, see Topology and Geometry of Deep Rectified Network Optimization Landscapes

Video

https://www.youtube.com/watch?v=fZ8WCic5u2I&list=PLYq7WW565SZgC_0OqyRjTcWKH3odod3j5

Asynchronous irregular firing