Computational learning theory

cosmos 19th December 2018 at 8:36pm
Learning theory

actually a lot of this is statistical learning theory..

Computational learning theory is Learning theory, but taking into account the Computational complexity of the Learning algorithms. There is a big overlap with topics studied in Statistical learning theory!

See Machine learning for more general picture.

Mostly Supervised learning (see Learning theory).

Probably approximately correct

video

Basic setup for supervised learning. Assumptions:

  • Invariance assumption. Data is randomly generated, and from the same distribution as data will be tested on!.
  • Realizability assumption. We assume our answer lies within some class of hypotheses. Relaxed in Agnostic learning

Empirical risk minimization is a good approach. If ERM is 0, we call it a consistent learner.

Overfitting. To guard against overfitting, limit size of hypothesis class (Occam's razor in learning theory). It basically looks like an extension of Hypothesis testing, to a class of hypotheses.. Proof of Occam PAC learning theorem

Two types of error:

  • Missfortune error: bad samples. vid (basic def of PAC learning here too)
  • Rarity error: error on rare circumstances which weren't learned.

Uniform convergence, main tool to proving results about PAC learnability. The basic result is Occam theorem (see Probably approximately correct)

VC dimension characterizes PAC-learnability (see "Understanding machine learning" book – Fundamental theorem of learning theory)

Covering-number generalization error bounds

Exact learning

We say concept class CC is efficiently learnable with membership and equivalence queries, if there exists a polynomial p(.) and an algorithmL L with access to membership and equivalence queries oracles, cCn\forall c \in C_n outputs in time p(n,size(c))p(n, size(c)) a concept hCh \in C that is equivalent to CC.

SQ learning

Also, learning in the presence of noise

Membership query, and Equivalence query

Example oracle + membership query is sufficient for PAC learning

Learning real-valued functions

Convex optimization

notes

Rademacher Complexity analogous to VC dimension, but for real-valued concepts.

Agnostic learning

So far in all the learning frameworks we’ve studied, we’ve made an assumption that there is some “ground truth” target function that we attempt to learn. Our goal has been to identify a hypothesis that is close to this target, with respect to the target distribution. Learning algorithms are given access to the target function in the form of labelled observations, which in some cases may be noisy. In this lecture, we’ll drop the assumption of a ground-truth target completely; it is for this reason that the framework is called agnostic learning. As there is no longer a well-defined notion of target, our goal will be to identify a hypothesis that is competitive with respect to the best concept from a particular class

videoweakness of the PAC definition which motivates agnostic learningto approach this we redefine succesful learning to have only a relative error guarantee –> Definition of Agnostic PAC learnability

notes

proof that finite classes are agnostic learnable

https://www.cs.bgu.ac.il/~asml162/Class_Material


See "Understanding machine learning" by Shai and Shai for more

General losses and more general definition of the learning problem

examples of other learning tasks

kind of clustering \sim to representing data with kk code words.

Probabilistic prediction rules can't predict better. But they can give us our uncertainty in the prediction, for each particular case we encounter. That extra information can be useful! (this is what modern Bayesian inference recognizes)

How to learn in such a general situation?

There are relations to Game theory

Online learning


Complexity Theoretic Limitations on Learning Halfspaces

Grammar learning


Nice classical experiment about supervision (1st lecture of lecture seires on ML)

https://simons.berkeley.edu/workshops/schedule/9161