Regularization: Cosmos — All that is, or was, or ever will be

Regularization

cosmos 18th December 2017 at 3:19pm

Often we want the model to be better at generalizing, and this is done by reducing model complexity. This encourages "encourages fitting signal rather than just noise". A lot of these methods are very much related to Model selection methods, as both try to make our model better, often by chosing "simpler" models.

9.520 - 09/14/2015 - Class 02 - Prof. Tomaso Poggio: The Learning Problem and Regularization – Class 02 - The Learning Problem and Regularization

We can penalize functions with high complexity, or limiting/penalizing other properties of the function. An example of this is penalizing the size of the weights or the norm of the parameter vector. We can do this by modifying certain parts of our learning algorithm for this purpose:

by modifying our hypothesis class (like limiting the number of features in Dictionary learning, what is called Feature selection). Another example is limiting the norm of the function in a Reproducing kernel Hilbert space, which can correspond, for instance, to Band-limited functions, with no high frequency features (or few).
See here. by modifying the Loss function, which is perhaps the most common/basic form of regularization, explained here. This one is what is often meant by regularization.
we can modify other aspects of the learning algorithm, which do other forms of more implicit regularization.

–> See comment here: whenever we project our data, or we optimize, we may be able to think of these as forms of regularization!

Iterative regularization via early stopping – derivation for squared loss – result for finite time

Using cross-validation for regularization can be done using Early stopping using the validation set

We can use a Prior distribution from Bayesian statistics, to make simple hypothesis more likely. See Simplicity and learning. Intuition

Neural networks [2.8] : Training neural networks - regularization

Dropout

Batch normalization

Tikhonov regularization

Amount of distance travelled is what regularizes in iterative regularization

Spectral filtering perspective, many regularization algorithms can be seen as filtering out the high frequencies of the covariance matrix of the input data (i.e. make the prediciton not depend on small variations, as these can be easily affected by noise!)