Singular learning theory

cosmos 25th May 2017 at 11:36pm
Information geometry Learning theory

A statistical model or a learning machine is called regular if the map taking a parameter to a probability distribution is one-to-one and if its Fisher information matrix is always positive definite. If otherwise, it is called singular. In regular statistical models, the Bayes free energy, which is defined by the minus logarithm of Bayes marginal likelihood, can be asymptotically approximated by the Schwarz Bayes information criterion (BIC), whereas in singular models such approximation does not hold. Uses tools of Algebraic geometry

A Widely Applicable Bayesian Information Criterion

Algebraic Analysis for Nonidentifiable Learning Machines, Algebraic Analysis for Singular Statistical Estimation

Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory

Singularities in mixture models and upper bounds of stochastic complexity

Algorithm for singular models: http://sci-hub.cc/10.1007/s11063-013-9283-z

Singularities Affect Dynamics of Learning in Neuromanifolds

Information-based inference in sloppy and singular models

Likelihood ratio of unidentifiable models and multilayer neural networks


Dynamics of Learning Near Singularities in Layered Networks

A Regularity Condition of the Information Matrix of a Multilayer Perceptron Network. The Fisher information matrix of a multi-layer perceptron network can be singular at certain parameters, and in such cases many statistical techniques based on asymptotic theory cannot be applied properly. In this paper, we prove rigorously that the Fisher information matrix of a three-layer perceptron network is positive definite if and only if the network is irreducible; that is, if there is no hidden unit that makes no contribution to the output and there is no pair of hidden units that could be collapsed to a single unit without altering the input-output map. This implies that a network that has a singular Fisher information matrix can be reduced to a network with a positive definite Fisher information matrix by eliminating redundant hidden units.

Singularities Affect Dynamics of Learning in Neuromanifolds

Application of the error function in analyzing the learning dynamics near singularities of the multilayer perceptrons

Resolution of Singularities Introduced by Hierarchical Structure in Deep Neural Networks

On the Singularity in Deep Neural Networks

On the Geometry of Feedforward Neural Network Error Surfaces