See Probably approximately correct
variables x1, ..., xn
c = x1 and x2 and not x3 ... and xn
Algorithm
- Start with h=x1∧x1¯∧...∧xn∧xn¯
- for any observed input/output pair (\vec{a}, 1). For all i, if ai=1, drop xi¯, if ai=0, drop x0.
- Return h.
we can only make false negative errors. if the real function c:
c(a)=1, h(a) may be 0 or 1.
c(a)=0 ==> h(a)=0.
Literal l. p(l)=Pa∼D[c(a)=1∧l=0 in a]. .. l is something like x1, or x1¯..
literal l causes missclassification if the event c(a)=1∧l=0 in a is not observed.
We worry about non-rare events: p(l)≥2nϵ. We call these bad literals
Al is the event that for m points drawn independently from D, l is a bad literal , but not eliminated from h.
Probability: P(Al)=(1−p(l))m≤(1−2nϵ)n
E=∪l badAl
Probably: Probability that we don't eliminate all bad literals is small: P(E)≤∑l badP(Al)≤(#badliterals)(1−2nϵ)m
≤2n(1−2nϵm≤2nexp(−mϵ/2n)≤δ
..
Approximately: Probability of error, if we eliminate all bad literals, is small: err(h)=Pa∼D[c(a)=1∧h(a)=0]≤∑l goodp(l)≤ϵ
c(a)=1∧h(a)=0⇒∃l∈h,l∈c s.t. l=0 in a.