Learning curve

cosmos 10th December 2019 at 6:47pm
Generalization error

Learning curves for Gaussian processes

Gaussian process learning curve

Learning curves for deep learning seem to follow Power laws: https://arxiv.org/abs/1712.00409


The theory of learning curves seems to be not well-known enough, but it's very useful. For example, it provides a rigorous derivation of the optimal regularization coefficient for L2 regularized least squares. One can show the following. Assume data is generated as y=w_0*x + eta, where x are generated from Gaussian with unit correlation, and where eta is Gaussian noise with sigma^2 variance. Assume your task is: given training data {(x_1,y_1),...,(x_m,y_m)}, generated that way, find w that minimizes the regularized mean squared error

(1/m)sum_{i=1}^m (w*x_i - y_i)^2 + lambda *|w|^2

and you are interested in the generalization error

Expectation[(w*x - y)^2] upon sampling a new pair (x,y) according to the data generating process. In particular, we consider the average of Expectation[(w*x - y)], over training data samples (which determine w).

Then,

for lambda between 0 and sigma^2/|w_0|^2, the generalization error decreases with increasing lambda.

for lambda greater than sigma^2/|w_0|^2, the error increases with increasing lambda. However, it stays lower than for lambda = 0, until a certain point lambda_c. This point is determined by how aligned, in expectation, w_0 is with the principal components of the correlation matrix of the training data. If w_0 is aligned with principal components then lambda_c is close to sigma^2/|w_0|^2, and viceversa. So making lambda higher than the optimal (overregularization) is safer when the true weight vector is pointing along the non-principal components, where there is little data variance. The intuition is that if this is the case, then there is smaller signal-to-noise ratio, so regularization is more benefitial (as it makes the algo less sensitive to noise, damping down potential large variance in w).

I find it pretty neat that one can have a well principled and quantitative justification of regularization! And now you know how big you need to make the reg coefficient (under some assumptions) :)

See nice discussion in comments with Cristof here: https://www.facebook.com/guillermovalleperez/posts/10156593654876223


Learning to generalize

Rigorous Learning Curve Bounds from Statistical Mechanics

entropy bias learning curve https://www.desmos.com/calculator/mb7g8wpxou


TAB DUUUMPPP

URL list from Wednesday, May. 1 2019 23:08 PM To copy this list, type [Ctrl] A, then type [Ctrl] C.

Exact learning curves for Gaussian process regression on large random graphs http://papers.nips.cc/paper/3981-exact-learning-curves-for-gaussian-process-regression-on-large-random-graphs

Kernels and learning curves for Gaussian process regression on random graphs http://papers.nips.cc/paper/3840-kernels-and-learning-curves-for-gaussian-process-regression-on-random-graphs

Learning Curves for Gaussian Process Regression: Approximations and Bounds | Neural Computation | MIT Press Journals https://www.mitpressjournals.org/doi/abs/10.1162/089976602753712990

Learning Curves for Gaussian Processes https://papers.nips.cc/paper/1501-learning-curves-for-gaussian-processes.pdf

Learning Curves for Gaussian Processes Models: Fluctuations and Universality | SpringerLink https://link.springer.com/chapter/10.1007/3-540-44668-0_39

Learning Curves for Gaussian Processes via Numerical Cubature Integration | SpringerLink https://link.springer.com/chapter/10.1007/978-3-642-21735-7_25

Learning-curves-for-Gaussian-process-regression-on-random-graphs-effects-of-graph-structure.pdf https://www.researchgate.net/profile/Peter_Sollich/publication/255645199_Learning_curves_for_Gaussian_process_regression_on_random_graphs_effects_of_graph_structure/links/00b7d536d37136de6f000000/Learning-curves-for-Gaussian-process-regression-on-random-graphs-effects-of-graph-structure.pdf

Urry: Random walk kernels and learning curves for... - Google Scholar https://scholar.google.co.uk/scholar?cites=12048397498288311790&as_sdt=2005&sciodt=0,5&hl=en

Upper and Lower Bounds on the Learning Curve for Gaussian Processes | SpringerLink https://link.springer.com/article/10.1023/A:1007601601278

Replica theory for learning curves for Gaussian processes on random graphs - IOPscience https://iopscience.iop.org/article/10.1088/1751-8113/45/42/425005/meta

Feature selection and learning curves of a multilayer perceptron chromosome classifier - IEEE Conference Publication https://ieeexplore.ieee.org/document/576994

[1412.4869] Expectation propagation as a way of life: A framework for Bayesian inference on partitioned data https://arxiv.org/abs/1412.4869

[1812.11118] Reconciling modern machine learning and the bias-variance trade-off https://arxiv.org/abs/1812.11118

1409.6179.pdf https://arxiv.org/pdf/1409.6179.pdf

1805.08522.pdf https://arxiv.org/pdf/1805.08522.pdf

Continuous-Space Gaussian Process Regression and Generalized Wiener Filtering with Application to Learning Curves | SpringerLink https://link.springer.com/chapter/10.1007/978-3-642-38886-6_17

DLMF: 6.12 Asymptotic Expansions https://dlmf.nist.gov/6.12

Haussler: Rigorous learning curve bounds from statistical... - Google Scholar https://scholar.google.com/scholar?safe=off&biw=1745&bih=956&um=1&ie=UTF-8&lr&cites=2913399941842643655&authuser=0

JST_pacbayes-JohnShawe.pdf http://web.cse.ohio-state.edu/mlss09/mlss09_talks/1.june-MON/JST_pacbayes-JohnShawe.pdf

1206.1901.pdf https://arxiv.org/pdf/1206.1901.pdf

[1904.11955] On Exact Computation with an Infinitely Wide Neural Net https://arxiv.org/abs/1904.11955

1904.11694.pdf https://arxiv.org/pdf/1904.11694.pdf

thesis.dvi https://www.bcs.org/upload/pdf/mseeger.pdf

shell script - How would I use GNU Parallel for this while loop? - Unix & Linux Stack Exchange https://unix.stackexchange.com/questions/229034/how-would-i-use-gnu-parallel-for-this-while-loop

seeger02a.dvi http://www.jmlr.org/papers/volume3/seeger02a/seeger02a.pdf

Malzahn: Learning curves for Gaussian processes models:... - Google Scholar https://scholar.google.co.uk/scholar?cites=152224715068156717&as_sdt=2005&sciodt=0,5&hl=en