A type of Supervised learning where one learns the function . See notes
The learning itself is done by Maximum likelihood, where the likelihood is:
where are the parameters of the theory, are the outputs, and are the input variables. Now our aim is maximizing this likelihood w.r.t , and as the denominator doesn't depend on , we can ignore it. We can also assume that , our prior, is uniform. Then, we want to maximize:
where we assumed that all the data points are independent. We have also assumed that our model only models , so that doesn't appear in . This is the main difference with Generative supervised learning. While maximizing the log likelihood
only the first term depends on , the second is fixed (and thus ignored in the optimization procedure).
See also Generative vs discriminative models
https://www.wikiwand.com/en/Discriminative_model
These have all the probability centered around one output, so they are better described as modelling directly , the output as a function of the input
Examples