See post: Learn in theory
The Statistical learning theory of Supervised learning. Mostly studies ways of ensuring good Generalization for learning algorithms. See more at Computational learning theory and Probably approximately correct
See Introduction to supervised learning theory for a more complete formal introduction, and these notes
We have several approaches for Generalization error bounds (see more detail here). See below.
General principle: The more that the bound depends on, the tighter it can be
Note that bounds dependent on training sample are basically bounds that depend on the Data distribution, but where we estimate the data-distribution-dependent bound from the sample, using some Concentration inequality |
where we assume the hypothesis class and data distribution (e.g. target function) are such that we can reach Empirical risk. This leads to PAC learning Data-dependent realizable learning bounds
where we don't assume anything about the data distribution, and just ask: how well can I do relative to the function that does best (has lowest true risk) in a given hypothesis class. It has a related formulation, where we instead ask how much does the true risk differ from the empirical risk. The error in the former can be shown to be at most twice the error in the later.
Data-dependent agnostic learning
(well technically, what's called "realizability" tends to be that the true risk is 0, while here I am using it as meaning that the empirical risk is 0)
Structural risk minimization is interesting because it only depends on the data distribution and algorithm kind of implicitly (well kinda how sample-dependent bounds depend on the data distribution implicitly.)