Maximum likelihood

cosmos 31st March 2019 at 5:08am

aka maximum likelihood estimation, MLE

Minimize a cost function, which often is the negative log likelihood (similar to entropy. More precisely, cross-entropy, or relative entropy), which corresponds to maximizing likelihood. Likelihood is the probability of getting the right yy given xx and θ\theta, i.e. the probability that a given model predicts the right outputs. This is equivalent to finding the most likely θ\theta in the Bayesian posterior, given a flat prior (but if we add a regularizer, we can tweak the prior, by just adding a term to the log likelihood). If our model uses a Gaussian distribution to predict the data (where the θ\thetas are the means), maximizing likelihood is equivalent to minimizing spring energy for springs vertically placed between fit curve and data.

The maximum likelihood is found by Optimization, often by Stochastic gradient descent.

If we want the whole distribution of likelihoods over θ\thetas, we need to use Bayesian statistics, which involves doing complicated integrals, often done numerically using Montecarlo methods

See video

Too see the application of this method in Supervised learning see Discriminative learning, and Generative learning

https://www.wikiwand.com/en/Maximum_likelihood_estimation

MLE via Variational inference

See Variational inference