Agnostic learning

cosmos 17th May 2019 at 12:37am
Computational learning theory

So far in all the learning frameworks we’ve studied, we’ve made an assumption that there is some “ground truth” target function that we attempt to learn. Our goal has been to identify a hypothesis that is close to this target, with respect to the target distribution. Learning algorithms are given access to the target function in the form of labelled observations, which in some cases may be noisy. In this lecture, we’ll drop the assumption of a ground-truth target completely; it is for this reason that the framework is called agnostic learning. As there is no longer a well-defined notion of target, our goal will be to identify a hypothesis that is competitive with respect to the best concept from a particular class

See Wolpert's "The relationship between PAC, the Statistical Physics Framework, and Bayesian Framework, and the VC Framework" (by VC framework, they essentially refer to what is now called agnostic PAC learning). He points out something, I also realized after thinking about it: "In general, VC results say it is unlikely for empirical error and true error to differ greatly. By themsleves, without information about P(true error), they do not say that it is unlikely that true error is large in those cases where empirical error is small".

–>notes

videoweakness of the PAC definition which motivates agnostic learningto approach this we redefine succesful learning to have only a relative error guarantee –> Definition of Agnostic PAC learnability

explaining agnostic learningrepeating the definition

Proving agnostic learnability of finite classes with ERM

video

Epsilon-representative classes

Defining sample complexity of uniform convergence

if we have uniform convergence, we can prove agnostic learninability

Step 1: bounding non-representativity of individual hypothesisdoing it, using Heoffding's inequality (a Concentration inequality) – result

Step 2: union bounddoing itRESULT

result for sample complexity of uniform convergence

We want to bound the sample complexity of uniform convergence

See more at Computational learning theory

Tighter agnostic bounds

One can obtain bounds which are tight both when the training error is high, or when it is low, by using tighter versions of the Chernoff bound on the Bernoulli probability. One can numerically compute optimal bounds in some cases.

See here.

If one wants more analytic expressions, the Relative entropy Chernoff bound is pretty tight for all the range of training errors.

More general training error-conditional bounds

Using the Weighted union bound, as in Structural risk minimization, one can give a non-uniform bound over training errors, which can allow to get better bounds if one has a prior belief over which training errors are more likely. Hmm, interesting. So does this mean that the relative entropy chernoff bound encodes a prior that favors smaler training error?

See here for proof. See Growth function generalization error bound for result that is needed to prove this.


Relevance of purely agnostic learning?

Given that we really care in practice, mostly, about when we will have small training error, so we are close to PAC learning.

It can be useful as a way to prove general bounds (that are still applicable in PAC learning), and also to guarantee that if we have small training error, our hypothesis class is probably not appropriate. One can however, obtain better bounds that interpolate between the standard Chernoff agnostic bounds, and the realizable ones. See seciton above.