actually a lot of this is statistical learning theory..
Computational learning theory is Learning theory, but taking into account the Computational complexity of the Learning algorithms. There is a big overlap with topics studied in Statistical learning theory!
See Machine learning for more general picture.
Mostly Supervised learning (see Learning theory).
Basic setup for supervised learning. Assumptions:
Empirical risk minimization is a good approach. If ERM is 0, we call it a consistent learner.
Overfitting. To guard against overfitting, limit size of hypothesis class (Occam's razor in learning theory). It basically looks like an extension of Hypothesis testing, to a class of hypotheses.. Proof of Occam PAC learning theorem
Two types of error:
Uniform convergence, main tool to proving results about PAC learnability. The basic result is Occam theorem (see Probably approximately correct)
VC dimension characterizes PAC-learnability (see "Understanding machine learning" book – Fundamental theorem of learning theory)
Covering-number generalization error bounds
We say concept class is efficiently learnable with membership and equivalence queries, if there exists a polynomial p(.) and an algorithmL L with access to membership and equivalence queries oracles, outputs in time a concept that is equivalent to .
Also, learning in the presence of noise
Membership query, and Equivalence query
Example oracle + membership query is sufficient for PAC learning
Rademacher Complexity analogous to VC dimension, but for real-valued concepts.
So far in all the learning frameworks we’ve studied, we’ve made an assumption that there is some “ground truth” target function that we attempt to learn. Our goal has been to identify a hypothesis that is close to this target, with respect to the target distribution. Learning algorithms are given access to the target function in the form of labelled observations, which in some cases may be noisy. In this lecture, we’ll drop the assumption of a ground-truth target completely; it is for this reason that the framework is called agnostic learning. As there is no longer a well-defined notion of target, our goal will be to identify a hypothesis that is competitive with respect to the best concept from a particular class
video – weakness of the PAC definition which motivates agnostic learning – to approach this we redefine succesful learning to have only a relative error guarantee –> Definition of Agnostic PAC learnability
proof that finite classes are agnostic learnable
https://www.cs.bgu.ac.il/~asml162/Class_Material
See "Understanding machine learning" by Shai and Shai for more
examples of other learning tasks
kind of clustering to representing data with code words.
Probabilistic prediction rules can't predict better. But they can give us our uncertainty in the prediction, for each particular case we encounter. That extra information can be useful! (this is what modern Bayesian inference recognizes)
How to learn in such a general situation?
There are relations to Game theory
Complexity Theoretic Limitations on Learning Halfspaces
Nice classical experiment about supervision (1st lecture of lecture seires on ML)