Often we want the model to be better at generalizing, and this is done by reducing model complexity. This encourages "encourages fitting signal rather than just noise". A lot of these methods are very much related to Model selection methods, as both try to make our model better, often by chosing "simpler" models.
9.520 - 09/14/2015 - Class 02 - Prof. Tomaso Poggio: The Learning Problem and Regularization – Class 02 - The Learning Problem and Regularization
We can penalize functions with high complexity, or limiting/penalizing other properties of the function. An example of this is penalizing the size of the weights or the norm of the parameter vector. We can do this by modifying certain parts of our learning algorithm for this purpose:
–> See comment here: whenever we project our data, or we optimize, we may be able to think of these as forms of regularization!
Iterative regularization via early stopping – derivation for squared loss – result for finite time
Using cross-validation for regularization can be done using Early stopping using the validation set
We can use a Prior distribution from Bayesian statistics, to make simple hypothesis more likely. See Simplicity and learning. Intuition
Neural networks [2.8] : Training neural networks - regularization
Amount of distance travelled is what regularizes in iterative regularization
Spectral filtering perspective, many regularization algorithms can be seen as filtering out the high frequencies of the covariance matrix of the input data (i.e. make the prediciton not depend on small variations, as these can be easily affected by noise!)