Regularization

cosmos 18th December 2017 at 3:19pm
Machine learning

Often we want the model to be better at generalizing, and this is done by reducing model complexity. This encourages "encourages fitting signal rather than just noise". A lot of these methods are very much related to Model selection methods, as both try to make our model better, often by chosing "simpler" models.

9.520 - 09/14/2015 - Class 02 - Prof. Tomaso Poggio: The Learning Problem and RegularizationClass 02 - The Learning Problem and Regularization

We can penalize functions with high complexity, or limiting/penalizing other properties of the function. An example of this is penalizing the size of the weights or the norm of the parameter vector. We can do this by modifying certain parts of our learning algorithm for this purpose:

–> See comment here: whenever we project our data, or we optimize, we may be able to think of these as forms of regularization!

Iterative regularization via early stoppingderivation for squared lossresult for finite time

Using cross-validation for regularization can be done using Early stopping using the validation set

We can use a Prior distribution from Bayesian statistics, to make simple hypothesis more likely. See Simplicity and learning. Intuition

Neural networks [2.8] : Training neural networks - regularization

Dropout

Batch normalization

Tikhonov regularization


Amount of distance travelled is what regularizes in iterative regularization

Spectral filtering perspective, many regularization algorithms can be seen as filtering out the high frequencies of the covariance matrix of the input data (i.e. make the prediciton not depend on small variations, as these can be easily affected by noise!)