What happens when or is infinite?
Roughly, VC-dimension for infinite class for finite classes. This will enter an extended version of the Occam's razor theorem..
Consistent learning algorithm for half-spaces
We will use Linear programming (for finding feasible solutions) four our consistent learner.
Is there a minimum number of examples needed to be seen for PAC learning?
YES
If has VC dimension d, then any PAC-learning algo for C that outputs , requires at least examples.
The more accuracy we want, or the more complex class, the more data we need.
One can't learn a class of infinite VC dimenison.
Chernoff bound: let be independent random variables with w.p , and w.p. . Let
Let be the set which gives the VC-dimension d. (i.e. has maximal cardinality while being shattered by C).
is uniform over S.
Suppose your algo sees d/10 examples. outputs some h.
For all examples in S\{examples}, h makes error w.p at least 1/2.
If your number of examples is less than the vc dimension, then, there's not hope of doing well in general, because the unobserved points could be anything..
See more in book.. what's the meaning of , here?
See here. These are upper bounds on the minimum sample complexity to generalize. So that if we are above these, we know we are above the minimum and we will generalize.
Theorem (Occam's razor, VC dim version): Let be a concept class of VC-dimension , and a hypothesis class. Let be a consistent learner for using . Then, over , if is given examples drawn from s.t. then outputs , that w.p , satisfies .
...... proof. see pic. Uses Epsilon-net (see more explanation there). Use trick of doubling your training set, to get finite things which allow to bound probabilities..but need to understand it better... See here, and here
See here
PAC learning power changes when you relax the requirement that the algo should work for any distribution on the input data |
Boosting, relaxing the condition doesn't increase power..
Can also give better bounds for finite concept classes, I think..