Supervised learning: Cosmos — All that is, or was, or ever will be

Supervised learning

cosmos 18th December 2018 at 9:39pm

aka predictive learning

The goal is to learn a mapping from inputs $x$ to outputs $y$ , given a labeled set of input-output pairs $D = \{(x^{(i)}, y^{(i)})\}_{i=1}^N$ . Here $D$ is called the training set, and $N$ is the number of training examples.

Training data consisting on inputs and outputs. Other names for inputs: predictors, independent variables, features, attributes, covariates. Other names for outputs: responses, response variables, dependent variables.

In supervised learning, we want to find function relating inputs to outputs, to then be able to predict new outputs from new inputs. Often, we need a way to represent the function approximation, with some parameters (the model; with some subtleties for non-parametric ones) and a learning algorithm to find best parameters for the data, so that the model can predict well.

See Introduction to supervised learning theory for a formal and precise introduction to supervised learning

– See Learning theory, to learn about the way learning algorithms work, overfitting, underfitting, generalization model selection, etc..

New paradigm: Deep learning

Types of supervised learning algorithms

Generative vs discriminative models.

See below

Parametric vs non-parametric model

Parametric. There is a fixed number of parameters that the algorithm fits.
Non-parametric. Formally, the number of "parameters" grows with the training set. Here number of parameters basically refers to "amount of information in learned function". For See Nonparametric statistics An example is Nearest-neighbour classification.

Continuous vs categorical output

Categorical, or nominal. Output $y$ belongs to some finite Set. The learning problem is called Classification.

continuous. $y$ is a Real number, or belongs to $R^n$ more generally. Learning is then known as Regression.

ordinal. When the output belongs to some set with some natural Ordering. Learning is then known as ordinal ordering

Regression

Output value is continuous, and quantiative (i.e. it has an ordering, and a notion of closeness (matrix)).

Classification

Output value is discrete, or categorical, or qualitative. No implicit ordering, or closeness on the variables. Simple approach: Logistic regression

Discriminative learning

Learning the function $p(\text{output}|\text{input})$ . See notes

Structured learning

wiki.

Often this actually deals with problems where input space has a different structure than a vector space (video)

General methods

Generalized linear model

Artificial neural network (see Deep learning)

Support vector machine

Decision tree