Algorithms
Neural networks [2.9] : Training neural networks - parameter initialization
The most common procedure, described above and in Machine learning, where the algorithm is run on a particular data set all at once.
video. Performing the optimization algo on a sample of a given size from the data set, per iteration.
To get through plateaus, for instance.
Momentum. You add inertia to the particle so that the gradient descent is not just velocity = gradient (as it'd be in viscous fluid), but it is acceleration = (viscosity) + gradient.
for Artificial neural networks
as opposed to batch learning
video. You have to make predictions even in the process of learning.
(Online algorithm, you process the data sequentially, by chunks. You need this if you do not access to all of it at the same time, or you have so much data that not all of it fits on your RAM..)
What we care about is the online error
Can apply batch learning algos for online learning
Several theoretical results exist
The simplicity and structure in signals in the real-world is often seized to make the learning problem easier to solve.
Diagnostic:
Ways to fix high variance or bias
Is the algo converging? – Are you optimizing the right function? –> Diagnostic
– Premature (statistical) optimization (video)
The dangers of over-theorizing