Overfitting and underfitting refer to ways of misstraining a model, i.e., making it have poor generalization error, compared to the optimal model.
Bias-variance tradeoff, see also Relation to bias/variance tradeoff
training error/generalization error
Underfitting. A learning algorithm with a lot of bias, meaning that they impose a lot of a priori structure/assumptions to the fitted functions. These, however, tend to have low variance, meaning that the fitted function doesn't tend to vary much when different training data sampled from the same process are used, they are stable
Overfitting. A learning algorithm with low bias (it is more flexible), which however has a lot of variance, as it fits the idiosyncrasies of the data; it fits the noise.
Deep Learning Lecture 5: Regularization, model complexity and data complexity (part 2)
So the simplest model that works seems to work best most of the time. Seems like an example of Occam's razor, and thus related to Solomonoff's ideas on inference (see Algorithmic information theory). Epicurus principle also related to Bayesian inference, because we give a distribution over models, but we keep all of them.
Hmm, also your error can't be smaller than the fundamental noise in the data. Well it can, but your model will at best be wasteful then.