Marginal likelihood: Cosmos — All that is, or was, or ever will be

Marginal likelihood

cosmos 25th June 2019 at 10:46pm

aka Bayesian evidence

https://www.wikiwand.com/en/Marginal_likelihood

It's the denominator in Bayes' theorem

$P(b) = \sum_{a\in A} P(b|a) P(a)$ , where $P(a)$ is the Prior and $P(b|a)$ is the likelihood.

When $a\in A$ belongs to a finite set, the sum is in general straight forward compute. If $a$ belongs to a a continuous space (like some manifold, or subset of $R$ ), then the sum becomes an integral, and it may not have a simple analytical form.

A simple Monte Carlo approximation of the integral may work, but is only feasible if $A$ is low dimensional, because in high dimensions there are many distinct parts of the space $A$ where the integrand is different and can't be reconstructed accurately from its neughbouring values (Curse of dimensionality).

So if $P(b) = \int_{A}P(b|a) P(a) da$ , where $da$ is the volume element of $A$ .

Then if we sampled $N$ $a$ s in uniformly from $A$ and computed $\frac{1}{N} \sum_a P(b|a) P(a)$ , then this would approximate $\frac{1}{V(A)} \int_{A}P(b|a) P(a) da$ . But doing this has the problem of curse of dimensionality which we described.

But we are wasting a lot of samples in regions where either $P(a)$ or $P(b|a)$ is small. A more clever way is to sample $a \sim P(a)$ , and then compute the empirical average

$\frac{1}{N} \sum_a P(b|a)$

which will approximate its expected value under $P(a)$ , which is $P(b)$ .

We are now not sampling from regions with low $P(a)$ , but we are still sampling from regions with low likelihood $P(b|a)$ value. But if we instead sampled from the posterior $P(a|b) = P(b|a)P(a)/P(b)$ (Bayes' theorem)? so that we don't sample from regions with low $P(b|a)$ either!?

Well duh, we can't compute the posterior, because we don't have $P(b)$ , that's the very thing we wanted to compute in the first place u dumbfucc; chicken and egg moch m8?

Guess wot m8, u don't need to nkow $P(b)$ to sample from posterior, u can use Markov chain Monte Carlo (MCMC).

How can we express the marginal likelihood $P(b)$ using samples $a$ from the posterior? Here is one way:

$\sum_{a\in A} \frac{1}{P(b|a)}\frac{P(b|a)P(a)}{P(b)} = \sum_a P(a) \frac{1}{P(b)} = \frac{1}{P(b)}$

The estimate is known as the hamonic mean estimate:

$\hat{P(b)} = \frac{1}{\frac{1}{N}\sum_a \frac{1}{P(b|a)}}$

This is literally THE WORST Monte Carlo estimate everr. See this: https://radfordneal.wordpress.com/2008/08/17/the-harmonic-mean-of-the-likelihood-worst-monte-carlo-method-ever/, intutition can be seen as follows: the terms that contribute significantly to the sum come from regions in $A$ with large prior prob $P(a)$ , but the samples are from regions with large posterior prob $P(a|b)$ , and so we may need to wait a humongous time to get samples from places with high prior but very low posterior probability.

What alternatives are there? See https://stats.stackexchange.com/questions/209810/computation-of-the-marginal-likelihood-from-mcmc-samples

Instead consider $\int_{A} \frac{1}{P(b|a)P(a)}\frac{P(b|a)P(a)}{P(b)} da = \frac{1}{P(b)} V(A)$ , but actually we don't need to integrate over the whole of A, the identity holds for any region $A' \subseteq A$ . Therefore we can choose the region cleverly. We want parts of $A$ that contribute to the sum to be sampled significantly. So let's choose $A'$ to be a region which is all of high posterior probability. This is called a high probability density (HPD) region of the posterior!