As used in Protein structure analysis, Statistical potentials score decoys by comparing their features to experimentally- determined structures, based on the assumption that the observed distributions of particular features reflect energetics: i.e., a common characteristic is assumed to be energetically favourable.
RAPDF (residue-specific all-atom probability discriminatory function)
Essentially a Naive Bayes classifier trained on a set of native structures (structures observed experimentally and assumed to be correct).
We wish to evaluate , theprobability the structure is a member of the "correct" set , given it contains the distances {}. We write this probability, using Baye's theorem as
We then make the assumption that factorizes as (i.e. the Naive Bayes assumption!).
The score of each decoy (with features ) is then just the negative log-likelihood
where is the score for the decoy, and is the distance between atoms and , of types and .
See more explanation here
The probabilites are estimated (as in Naive Bayes) as sample frequencies:
where the distances have been discretized, and means number of occurrences of distance between residues of type and over all native configurations in .
The average over all experimental structures is
over all structures, not just those in .
However, they further approximate this, and assume (I guess as over all structures, there is more randomness..) that this is independent of and , and that they can estimate this as the average over all and and all structures, in (because we don't have the whole set of possible structures I guess)
over structres in .