Bayes' theorem as a softmax: Cosmos — All that is, or was, or ever will be

Bayes' theorem as a softmax

cosmos 30th April 2017 at 3:06pm

As in Generative learning, we can relate $p(x|y)$ to $p(y|x)$ , using Baye's theorem

Baye's theorem can be put into the form of a Boltzmann distribution, by defining the "self-information" or "surprisal", or Hamiltonian, in the context of Statistical physics

This on turn, can be succinctly written using the Softmax nonlinear opeartor $\mathbf{\sigma}$ as

$\mathbf{p}(\mathbf{y})= \mathbf{\sigma} [ -\mathbf{H}(\mathbf{y})-\mathbf{\mu}]$

That means that if we compute the Hamiltonian (which depends on the generating process encoded on the conditional probability distribution $p(y|x)$ ), we can add a softmax layer to compute the Bayes a-posteriori probability distribution. The a-priori distribution $p(x)$ goes into $\mu$ , which is the bias term of the final layer.