aka predictive learning
The goal is to learn a mapping from inputs to outputs , given a labeled set of input-output pairs . Here is called the training set, and is the number of training examples.
Training data consisting on inputs and outputs. Other names for inputs: predictors, independent variables, features, attributes, covariates. Other names for outputs: responses, response variables, dependent variables.
In supervised learning, we want to find function relating inputs to outputs, to then be able to predict new outputs from new inputs. Often, we need a way to represent the function approximation, with some parameters (the model; with some subtleties for non-parametric ones) and a learning algorithm to find best parameters for the data, so that the model can predict well.
See Introduction to supervised learning theory for a formal and precise introduction to supervised learning
– See Learning theory, to learn about the way learning algorithms work, overfitting, underfitting, generalization model selection, etc..
New paradigm: Deep learning
See below
Categorical, or nominal. Output belongs to some finite Set. The learning problem is called Classification.
continuous. is a Real number, or belongs to more generally. Learning is then known as Regression.
ordinal. When the output belongs to some set with some natural Ordering. Learning is then known as ordinal ordering
Output value is continuous, and quantiative (i.e. it has an ordering, and a notion of closeness (matrix)).
Output value is discrete, or categorical, or qualitative. No implicit ordering, or closeness on the variables. Simple approach: Logistic regression
Learning the function . See notes
wiki.
Often this actually deals with problems where input space has a different structure than a vector space (video)
Artificial neural network (see Deep learning)
Learning the function , together with , which can be used to find using Baye's theorem. See notes. See lecture video def