aka boosting
Get an ensemble of hypotheses, by changing our input data to the algorithm in different and clever ways. We then have some clever devision rule that combines the classifiers.
Weak learning: Like PAC learning, but we now just require , for sme function . It is efficient if it runs in poly time, H are poly evaluatable, and is poly bounded (on its arguments).
Ada boost
(adaptive boosting)
at every iteration , we have a prob dist that you use to sample points from the training data, and train the weak learner. At every iteration, we update the prob dist by making points which the algorithm had wrong and making them more likely, while points it got right are less likely.
For :
At every iteration, we get a hypothesis.
Our final decision rule is gotten by a linear combination which takes into account which points we got wrong with each hypothesis. . Let .
Training error: (see notes, and photo). If the Zs are less than 1, then it doesn't get that many iterations for the trainig error to be small.
(if we minimize this over ).
(using (definition of WL). (note ).
It increases the weight of examples which are classified incorerctly many times, and so has problems with noise in labels.
See paper in webseite. The idea of multiplicative weights is quite applicable.
Generalizaation...VC dimension?..