PAC learning with infinite concept classes: Cosmos — All that is, or was, or ever will be

PAC learning with infinite concept classes

cosmos 31st October 2017 at 3:32pm

What happens when $H$ or $C$ is infinite?

–>VC dimension

Roughly, VC-dimension for infinite class $\approx \log|H|$ for finite classes. This will enter an extended version of the Occam's razor theorem..

Consistent learning algorithm for half-spaces

We will use Linear programming (for finding feasible solutions) four our consistent learner.

Is there a minimum number of examples needed to be seen for PAC learning?

YES

If $C$ has VC dimension d, then any PAC-learning algo for C that outputs $h\in C$ , requires at least $\frac{d-1}{32\epsilon}$ examples.

The more accuracy we want, or the more complex class, the more data we need.

One can't learn a class of infinite VC dimenison.

Chernoff bound: let $X_1,..X_n$ be independent random variables with $X_i =1$ w.p $p_i$ , and $X_i=0$ w.p. $1=p_i$ . Let $\mu = E[\sum_i X_i]=\sum_i p_i$

$P[|X-\mu| \geq \alpha \mu] \leq 2e^{-\,u\alpha^2/3}$

Let $S$ be the set which gives the VC-dimension d. (i.e. has maximal cardinality while being shattered by C).

$D$ is uniform over S.

Suppose your algo sees d/10 examples. outputs some h.

For all examples in S\{examples}, h makes error w.p at least 1/2.

If your number of examples is less than the vc dimension, then, there's not hope of doing well in general, because the unobserved points could be anything..

See more in book.. what's the meaning of $\epsilon$ , here?

Growth function

Sample complexity upper bound

See here. These are upper bounds on the minimum sample complexity to generalize. So that if we are above these, we know we are above the minimum and we will generalize.

Theorem (Occam's razor, VC dim version): Let $C$ be a concept class of VC-dimension $d$ , and $H$ a hypothesis class. Let $L$ be a consistent learner for $C$ using $H$ . Then, $\forall n, \forall c \in C_n, \forall D$ over $X_n$ , if $L$ is given $m$ examples drawn from $EX(c,D)$ s.t. $m \geq \frac{1}{\epsilon} (d\log{1/\epsilon}+ \log{1/\delta})$ then $L$ outputs $h$ , that w.p $\geq 1-\delta$ , satisfies $err(h) \leq \epsilon$ .

...... proof. see pic. Uses Epsilon-net (see more explanation there). Use trick of doubling your training set, to get finite things which allow to bound probabilities..but need to understand it better... See here, and here

Sample complexity lower bound

See here

PAC learning power changes when you relax the requirement that the algo should work for any distribution on the input data

Boosting, relaxing the $\epsilon$ condition doesn't increase power..

Can also give better bounds for finite concept classes, I think..