Overfitting and underfitting

cosmos 6th March 2017 at 7:42pm
Learning theory

Overfitting and underfitting refer to ways of misstraining a model, i.e., making it have poor generalization error, compared to the optimal model.

Bias-variance tradeoff, see also Relation to bias/variance tradeoff

training error/generalization error

Underfitting. A learning algorithm with a lot of bias, meaning that they impose a lot of a priori structure/assumptions to the fitted functions. These, however, tend to have low variance, meaning that the fitted function doesn't tend to vary much when different training data sampled from the same process are used, they are stable

Overfitting. A learning algorithm with low bias (it is more flexible), which however has a lot of variance, as it fits the idiosyncrasies of the data; it fits the noise.

See explanation here and here

Deep Learning Lecture 5: Regularization, model complexity and data complexity (part 2)

So the simplest model that works seems to work best most of the time. Seems like an example of Occam's razor, and thus related to Solomonoff's ideas on inference (see Algorithmic information theory). Epicurus principle also related to Bayesian inference, because we give a distribution over models, but we keep all of them.

Hmm, also your error can't be smaller than the fundamental noise in the data. Well it can, but your model will at best be wasteful then.