Generalization error: Cosmos — All that is, or was, or ever will be

Generalization error

cosmos 10th April 2019 at 7:04pm

See Learning theory, Learning curve

Generalization bound techniques

Basically, anything that restricts/controls the capacity. That is, gives an inductive bias!

Minimax risk
Bayes risk
Test set bound. See here
Cross-validation generalization error bounds
VC dimension (for binary classification)
PAC-Bayesian learning
Fourier concentration-based bounds (see Sensitivity (here), Wavelets..)
Rademacher complexity
- Covering-number generalization error bounds
- Growth function generalization error bound (for binary classification)
Margin-based generalization bound (see Support vector machines for instance)
Statistical physics framework
Algorithmic robustness and Algorithmic stability. Continuity of loss functions implies algorithmic stability
- Uniform stability
Norm-based bounds (see Generalization in deep learning for examples).
Occam razor bounds. See here for instance. Essentially the same as Structural risk minimization. Simplicity bias in the Parameter-function map (see my paper)
Kolmogorov complexity. See here for instance.

Learning curve

Almost-everywhere algorithmic stability and generalization error

Generalization in evolution

Generalization in deep learning

Generalization ability of Boolean functions implemented in feedforward neural networks

Learning to generalize

Introduction to supervised learning theory

One-shot learning, Zero-shot learning

Entropy dependence of generalization in matrix map, learned via Logistic regression. Should I try with Perceptron algorithm?

[Btw side thought. For the 0/1 matrices, the only way of explaining what we see that I can think of, is that if the VC dimension of the set of epsilon-bad functions is higher for maps with the medium entropies (intuitively: there are more functions which predict wrongly when the true function has medium entropy, than when the true function has either high or low entropy)..

After looking back at some learning theory, I think this pattern can be understood if, for some reason, there are more functions (in the class of matrix map functions) which disagree significantly with the functions with row entropy 0.9, than with other functions. In technical jargon: the VC dimension of the set of epsilon-bad functions relative to functions with entropy 0.9 is larger than for other functions..

See WWIS, emails, etc..

NIPS 2017-Variance-based Regularization with Convex Objectives – optimal bias/variance tradeoff, Variance-based regularization with convex objectives

Generalization