(See context at the Arrival of the frequent).
Neutral spaces can be astronomically large, much bigger than even the largest viral or bacterial populations (see this paper). In that case, the local neighborhood of the population may not be fully representative of the neighborhood of the entire space.
This scenario can most easily understood in the monomorphic limit: when mutants are rare,
Now, the (average) rate of neutral mutations (per individual) is , as is the probability that a mutation is neutral.
See more in the Monomorphic limit (Wright-Fisher model) tiddler.
Furthermore, Kimura showed two things relating to fixation (see Population genetics):
Now, when mutations are rare enough (that the same mutation occurring twice simultaneously is very unlikely), a mutation will initially just have a frequency . This fact, combined with the above results imply two things:
The second point means that we are in a situation where the population fixes to a particular genotype in , in the relatively fast time-scale , and stays there during the much longer time , before it fixes to a new genotype.
+++++++(...)++++++
Short term correlations refer to: p-type individuals are being sampled from the same set (the set of p-types in the 1-neighbourhood of the currently fixed q-type genotype which most of the population has) throughout the time that the population is fixed to a particular genotype. When the population (relativelt quickly) transfers to a new genotype, the p-types produced are now sampled from a new set, but still all of them from the same set. The fact that they are sampled from the same set in inter-refixation times (tau_f), means they have correlations that last tau_f in average ("short-term")
If fixations occur much before the set of p-types in 1-neighbourhood is explored, these correlations are no longer observed.
As our evolutionary process is a Markov process, the first discovery time of a neighbour genotype as well as the arrival time of the neutral mutant ‘‘destined’’ to be fixed, are distributed geometrically (or exponentially in a model with continuous time). Thus the mean of these times are equal to the respective standard deviation, and we have large fluctuations.
The geometrical distribution comes about because the Markov property implies that one can define a probability for each of the two events above ({discovery of all neighbour genotype}, and {arrival of the neutral mutant ‘‘destined’’ to be fixed}), and then, each generation corresponds to a Bernoulli trial, and first arrival times follow a geometric distribution. For example, the probability of {arrival of the neutral mutant ‘‘destined’’ to be fixed}, is approximately (valid when , which we assume. This ultimately comes from the fact that {when the probability of an event is small the average number of times it occurs on a set of trials, is approximately the same as {the probability of it occurring any number of times}}. Essentially when . See Probability theory too).
The continuous time approximation: the mean {generation of first success, } is fixed to (where is the prob. of success in Bernoulli trial). We rescale the time variable as , and the mean is , where is the reciprocal of the time step (i.e. the time we define that a generation lasts). The geometric distribution becomes .
Now, is the time scale to find all the 1-neighbour genotypes. If is the number of mutations that can take to a . Then, is the time-scale to get a mutant from . This is because, is the probability that {a mutation from leads to }. The mean will be of this same order (and I think equal actually). Therefore the time to {first get {a mutation from leads to }}}, , is distributed according to , where is a normalization constant. Therefore, the {probability to get {a mutation from leads to }} in the a time (the time between two consecutive fixations)} is . Integrating over the distribution of , we have the probability that phenotype is discovered before the next neutral fixation:
For (large population limit):
We can apply a mean-field approximation to the monomorphic limit. Let be the probability that a genotype in has the given value of . Then , if we assume for . Then , where is the average of .
For (large genome limit), . In particular, . Then .
Finally, is {the probability that phenotype is discovered before the next neutral fixation}, i.e. the probability that the {number of times {[phenotype ] appears} before the next neutral fixation} is greater than , which is approximately the same as {the average number of times [it] appears}, if {{the probability that {[it] appears in one generation}} is small}, which is the case as {in the monomorphic limit, mutants are rare, }. Then, is the average of this quantity, which we use in the mean-field approximation.
Then, following the same derivation as in Polymorphic limit (Wright-Fisher model), we have
where is the (mean) duration of each "step" (corresponding to going from being fixated to one genotype to another). Now, {the average number of mutations from a genotype in leading to phenotype } can be expressed as , as is the mean probability that {a single-point mutation from a genotype in leads to phenotype }, and is the number of single-point mutations. Now, we can find at the two limits of interest: