See also Neural network theory (mostly about Artificial neural network Learning theory), Deep learning theory, Hopfield network, Spin glass
Thouless-Anderson-Palmer equations for neural networks
Nonequilibrium analysis of simple neural nets
High-dimensional dynamics of generalization error in neural networks Generalization error diverges when number of examples equals number of dimensions, for target function with noise
To understand phenomenon described by Saxe and in the video at 43:00, we can think of this: Low eigenvalues in XX^T correspond to directions with little variations in the input. However, by the random fluctuation eta, the output could have an O(1) variation, even for arbitrarily small input variation, which requires a large weight to fit, and produces large generalization error.
for $\alpha<1$ the probability of this directions in input space with low variation decreases, as we get less directions overall with points I think (directions with no points/variation are ignored for the algorithm which projects weight into input subspace, and these are the 0 eigenvalue parts of the Marchenko-Pastur distr)
See Deep learning theory and Hopfield network
See more at Neural Networks: An Introduction [2 ed.] by Muller et al.
The ferromagnetic case corresponds to a neural network that has stored a single pattern, and has no Frustration. The network which has been loaded with a large number of randomly composed patterns resembles a spin glass.
The Loss Surfaces of Multilayer Networks
How glassy are neural networks?
Spin-glass models of neural networks
Statistical Mechanics of Neural Networks near Saturation
Statistical mechanics of learning
A Correspondence Between Random Neural Networks and Statistical Field Theory
Exponential expressivity in deep neural networks through transient chaos
Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice
There is a hierarchical fragmentation of the state space as more local minima appear in the free energy as one decreases the temperature in a Spin glass
This seems related qualitatively to how the sublevel sets become more irregular as one decreases the energy level for neural network loss landscapes, see Topology and Geometry of Deep Rectified Network Optimization Landscapes
https://www.youtube.com/watch?v=fZ8WCic5u2I&list=PLYq7WW565SZgC_0OqyRjTcWKH3odod3j5
Asynchronous irregular firing
See also Neuronal network