As in Generative learning, we can relate to , using Baye's theorem
Baye's theorem can be put into the form of a Boltzmann distribution, by defining the "self-information" or "surprisal", or Hamiltonian, in the context of Statistical physics
This on turn, can be succinctly written using the Softmax nonlinear opeartor as
That means that if we compute the Hamiltonian (which depends on the generating process encoded on the conditional probability distribution ), we can add a softmax layer to compute the Bayes a-posteriori probability distribution. The a-priori distribution goes into , which is the bias term of the final layer.