Discriminative learning: Cosmos — All that is, or was, or ever will be

Discriminative learning

cosmos 4th November 2016 at 2:43pm

A type of Supervised learning where one learns the function $p(\text{output}|\text{input})$ . See notes

Learning method

The learning itself is done by Maximum likelihood, where the likelihood is:

$p(\theta|(x,y))=\frac{p((x,y)|\theta)p(\theta)}{p(x,y)}$

where $\theta$ are the parameters of the theory, $y$ are the outputs, and $x$ are the input variables. Now our aim is maximizing this likelihood w.r.t $\theta$ , and as the denominator doesn't depend on $\theta$ , we can ignore it. We can also assume that $p(\theta$ , our prior, is uniform. Then, we want to maximize:

$p((x,y)|\theta) = \prod_{i}p(y^{(i)}|x^{(i)};\theta)p(x^{(i)})$

where we assumed that all the data points are independent. We have also assumed that our model only models $p(y|x)$ , so that $\theta$ doesn't appear in $p(x)$ . This is the main difference with Generative supervised learning. While maximizing the log likelihood

$l(\theta) = \log{(p(x,y)|\theta)} = \sum_i \log{p(y^{(i)}|x^{(i)};\theta)} + \sum_i \log{p(x^{(i)})}$

only the first term depends on $\theta$ , the second is fixed (and thus ignored in the optimization procedure).

https://www.wikiwand.com/en/Discriminative_model

Deterministic discriminative models

These have all the probability centered around one output, so they are better described as modelling directly $y(x)$ , the output as a function of the input

Examples

Decision trees
- Random forest