aka Bayesian evidence
https://www.wikiwand.com/en/Marginal_likelihood
It's the denominator in Bayes' theorem
, where is the Prior and is the likelihood.
When belongs to a finite set, the sum is in general straight forward compute. If belongs to a a continuous space (like some manifold, or subset of ), then the sum becomes an integral, and it may not have a simple analytical form.
A simple Monte Carlo approximation of the integral may work, but is only feasible if is low dimensional, because in high dimensions there are many distinct parts of the space where the integrand is different and can't be reconstructed accurately from its neughbouring values (Curse of dimensionality).
So if , where is the volume element of .
Then if we sampled s in uniformly from and computed , then this would approximate . But doing this has the problem of curse of dimensionality which we described.
But we are wasting a lot of samples in regions where either or is small. A more clever way is to sample , and then compute the empirical average
which will approximate its expected value under , which is .
We are now not sampling from regions with low , but we are still sampling from regions with low likelihood value. But if we instead sampled from the posterior (Bayes' theorem)? so that we don't sample from regions with low either!?
Well duh, we can't compute the posterior, because we don't have , that's the very thing we wanted to compute in the first place u dumbfucc; chicken and egg moch m8?
Guess wot m8, u don't need to nkow to sample from posterior, u can use Markov chain Monte Carlo (MCMC).
How can we express the marginal likelihood using samples from the posterior? Here is one way:
The estimate is known as the hamonic mean estimate:
This is literally THE WORST Monte Carlo estimate everr. See this: https://radfordneal.wordpress.com/2008/08/17/the-harmonic-mean-of-the-likelihood-worst-monte-carlo-method-ever/, intutition can be seen as follows: the terms that contribute significantly to the sum come from regions in with large prior prob , but the samples are from regions with large posterior prob , and so we may need to wait a humongous time to get samples from places with high prior but very low posterior probability.
What alternatives are there? See https://stats.stackexchange.com/questions/209810/computation-of-the-marginal-likelihood-from-mcmc-samples
Instead consider , but actually we don't need to integrate over the whole of A, the identity holds for any region . Therefore we can choose the region cleverly. We want parts of that contribute to the sum to be sampled significantly. So let's choose to be a region which is all of high posterior probability. This is called a high probability density (HPD) region of the posterior!