See Neural network theory. For choices of nonlinear operators see Layers for deep learning
Notes from Rodrigo Mendoza's talk
Deep learning is an area of machine learning that studies learning algorithms with multiple levels of abstraction
Why Deep Learning models perform so well?
Seems to be a result of:
Mathematical difficulty because: Nonlinearly, non-convexity (convex optimization or complex analysis techniques not available), many d.o.f.
ResultsL
Neural network composed of neurons.
Data into Dendrites, scales. Axom computes (apply nonlinear function) and propagates output trough synapse.
A multilayer feedforward neural network.
L+2 layers. L hidden
The neural network is just a funciton from to , wlog..?..
Training, given dataset of inputs and outputs and want function to map these as well as possibly
Use Loss function and regulariser (penalization on size of parameters. Could also try to maximize sparsity, Occam's razor, bias towards simpler model. Also makes surface more convex).
Then minimize empirical risk. To minimize we use stochastic gradient descent.
Assuming function as being continuity, differentiability, convexity.
Can a multilayer feedworward network f approximate g arbitrarily well, for a very general g.
Universality
We can't expect f for the model considered (one layer) to approximate any g whatsoever, there are some very pathological functions.We can assume g is continuous, or just Lebesgue measurable (use this metric for defining closeness in this case).
We can show then f can approximate g approximately well.
Many other models are also known to be universal.
Other minima.
Loss surface is the surface defined by the empirical risk, EM..
The epigraph is non-convex.
Local minima o EM are known to abound.
Results:
Other results: only a few parameters matter.
The manifold hypothesis: meaningful data often concentrates on a low dimensional manifold, so large amounts of parameters don't matter.
-—>See dissertation topic proposed by Ard Louis.
Energy propagating from node i through path j
Analogy between loss function of neural network and hamiltonian of spin glass.
(Multilayer: composition of functions.)