Test the model on data you haven't used for training.
min-max, average
https://www.cs.cmu.edu/~schneide/tut5/node42.html
Wikipedia has good explanations: https://en.wikipedia.org/wiki/Cross-validation_(statistics)
To search for best configuration of hyperparameters acoording to the validation error, there are several methods. Some popular ones are:
Sometimes, once a model complexity (and other Hyperparameters) are picked, the model is trained on the whole data set.
Optimizing the number of epochs See Overfitting and underfitting
To predict the generalization error (see Learning theory) of the chosen hyperparameters that are best, we can't just look at their result at SCV, as that set has been by the algorithm to choose the final answer. As a result, the error on SCV is a biased estimator of generalization error. To have an unbiased estimator, we need a test set, that is only used once the learning algorithm has finished completely. See video
More computationally expensive
k-fold CV, for whem k={number of training examples}, so for each iteration, you leave one out.
Even more computationally expensive, but more accurate estimate of generalization error. Only done when the data is very scarce.
I have the feeling that one (not necessarily me xD) could maybe theoretically prove that something like cross-validation is an optimal way to estimate the generalization error of your algorithm (under some definitions of optimality). In a way that would be kinda annoying because CV is used all the time, so we won't get anything better. But, on the other hand it'd be pretty interesting and kinda useful to know... [insert thonk emoji]
Some thoughts when I misunderstood train/validation/test learning.
What I describe here is some sort of hyperlearning where we have two steps, where we learn two sets of hyperparameters, and use two different validation sets (which I call below validation, and test, mistakenly...)
When we have trained the data using a method as hold-out CV, and with some fixed learning Hyperparameters. If we want to learn the hyperparameters, we can do this whole training procedure with several values of the hyperparameter.
However, to choose the hyperparameters that are best, we can't just look at their result at SCV, as that set has been by the algorithm to choose the final answer. As a result, the error on SCV is a biased estimator of generalization error (see Learning theory). To have an unbiased estimator, we need a test set, that is only used once the learning algorithm has finished completely. We can compare the results on the test set to choose the best set of hyperparameters. Note, that once we have done that, the test error is no longer an unbiased estimate of generalization error, as we have used it to output our very final answer; i.e. our final answer depended on it!! We'd need a hypertest set