Optimization for learning: Cosmos — All that is, or was, or ever will be

Optimization for learning

cosmos 4th November 2016 at 2:43pm

Algorithms

Parameter initialization

Neural networks [2.9] : Training neural networks - parameter initialization

Batch learning

The most common procedure, described above and in Machine learning, where the algorithm is run on a particular data set all at once.

Mini-batch learning

video. Performing the optimization algo on a sample of a given size from the data set, per iteration.

Momentum

To get through plateaus, for instance.

Momentum. You add inertia to the particle so that the gradient descent is not just velocity = gradient (as it'd be in viscous fluid), but it is acceleration = (viscosity) + gradient.

Backpropagation

for Artificial neural networks

Online learning

as opposed to batch learning

video. You have to make predictions even in the process of learning.

_{(Online algorithm, you process the data sequentially, by chunks. You need this if you do not access to all of it at the same time, or you have so much data that not all of it fits on your RAM..)}

What we care about is the online error

Can apply batch learning algos for online learning

Several theoretical results exist

High variance (overfitting) -> Training error would be much lower than test error. (video)
High bias (underfitting) -> Test error will also be high. (video)

Ways to fix high variance or bias

Is the algo converging? – Are you optimizing the right function? –> Diagnostic

One more example