Monomorphic limit (Wright-Fisher model): Cosmos — All that is, or was, or ever will be

Monomorphic limit (Wright-Fisher model)

guillefix 4th November 2016 at 2:43pm

(See context at the Arrival of the frequent).

Neutral spaces can be astronomically large, much bigger than even the largest viral or bacterial populations (see this paper). In that case, the local neighborhood of the population may not be fully representative of the neighborhood of the entire space.

This scenario can most easily understood in the monomorphic limit: when mutants are rare, $NL \mu \ll 1$

Now, the (average) rate of neutral mutations (per individual) is $\nu = L \mu \rho$ , as $\rho$ is the probability that a mutation is neutral.

See more in the Monomorphic limit (Wright-Fisher model) tiddler.

Furthermore, Kimura showed two things relating to fixation (see Population genetics):

Probability of fixation. In a population following the Wright-Fisher model in a neutral space (no relative fitnessses), with no mutations, a single allele will eventually fix, and the probability for a particular allele to be the one that fixes is equal to its initial frequency. See the derivation here or here (page 15). For the generalization to non-neutral space see here (page 201). See here too.
Mean fixation time. In a population following the Wright-Fisher model in a neutral space (no relative fitnessses), with no mutations, the average time that it takes for a particular allele to fix, given that it fixes, is $\bar \tau(p)=-4N\left(\frac{1-p}{p}\right)\ln(1-p)$ , where $p$ is the initial frequency of the allele. For $p$ small ( $\rightarrow 0$ ), $\ln(1-p) \rightarrow -p$ , and $\frac{1-p}{p} \rightarrow \frac{1}{p}$ , and so $\bar \tau(p) \rightarrow 4N$ . See this SX question, the original paper by Kimura, here. For another way of deriving it, for the related Moran model see here (page 57).

Now, when mutations are rare enough (that the same mutation occurring twice simultaneously is very unlikely), a mutation will initially just have a frequency $p = 1/N$ . This fact, combined with the above results imply two things:

The rate of fixations is equal to the rate of (neutral) mutations of an individual. The average rate of mutations in the total population is $N\nu = N L \mu \rho$ . As their initial frequency is $p = 1/N$ , then they have a probability of fixation $1/N$ . Then {the rate of {mutations that fix}} is $\text{rate at which they appear} \times \text{probability they fix}$ $=$ $N L \mu \rho (1/N) = L \mu \rho$ , where $\rho$ is probability that a mutation is neutral (otherwise it can't fix as we assume non-neutral mutants have $0$ fitness).
{The mean fixation time of {a mutation that fixes}} is much smaller than {the mean time to get {a mutation that fixes}}, which we write mathematically as, $$ $\bar \tau(p) \ll \tau_m$ . If $N$ is large, $p = 1/N \ll 1$ , and so $\bar \tau(p) \approx 4N$ . Also the time scale of getting {a mutation that fixes} ({the mean time to get {a mutation that fixes}} would be of the same order, of course) is $1/\text{rate} = 1/(L \mu \rho)$ . Their ratio is $\frac{\bar \tau(p)}{\tau_m} = 4N L \mu \rho \ll 1$ , by the defining assumptions of the monomorphic limit, and noting that $\rho$ , being a probability is $<1$ . See here or here

The second point means that we are in a situation where the population fixes to a particular genotype in $\mathcal{N}_q$ , in the relatively fast time-scale $4N$ , and stays there during the much longer time $1/(L \mu \rho)$ , before it fixes to a new genotype.

+++++++(...)++++++

Large population limit
Large genome limit

Short term correlations refer to: p-type individuals are being sampled from the same set (the set of p-types in the 1-neighbourhood of the currently fixed q-type genotype which most of the population has) throughout the time that the population is fixed to a particular genotype. When the population (relativelt quickly) transfers to a new genotype, the p-types produced are now sampled from a new set, but still all of them from the same set. The fact that they are sampled from the same set in inter-refixation times (tau_f), means they have correlations that last tau_f in average ("short-term")

If fixations occur much before the set of p-types in 1-neighbourhood is explored, these correlations are no longer observed.

As our evolutionary process is a Markov process, the first discovery time of a neighbour genotype as well as the arrival time of the neutral mutant ‘‘destined’’ to be fixed, are distributed geometrically (or exponentially in a model with continuous time). Thus the mean of these times are equal to the respective standard deviation, and we have large fluctuations.

The geometrical distribution comes about because the Markov property implies that one can define a probability for each of the two events above ({discovery of all neighbour genotype}, and {arrival of the neutral mutant ‘‘destined’’ to be fixed}), and then, each generation corresponds to a Bernoulli trial, and first arrival times follow a geometric distribution. For example, the probability of {arrival of the neutral mutant ‘‘destined’’ to be fixed}, is approximately $L \mu \rho$ (valid when $L \mu \rho \ll 1$ , which we assume. This ultimately comes from the fact that {when the probability of an event is small the average number of times it occurs on a set of trials, is approximately the same as {the probability of it occurring any number of times}}. Essentially $1 - (1 - p)^N \approx = Np$ when $p \ll 1$ . See Probability theory too).

The continuous time approximation: the mean {generation of first success, $k$ } is fixed to $\bar{k}= 1/p$ (where $p$ is the prob. of success in Bernoulli trial). We rescale the time variable as $\tau = k/N$ , and the mean is $\tau_f = 1/(pN)$ , where $N$ is the reciprocal of the time step (i.e. the time we define that a generation lasts). The geometric distribution becomes $\lim_{N \rightarrow \infty} p(1-p)^{k-1} = \frac{1}{\tau_f N}(1-\frac{1}{\tau_f N})^{\tau N-1} = e^{-\tau / \tau_f}$ .

Now, $\tau_e = (K-1)L/(NL \mu)$ is the time scale to find all the 1-neighbour genotypes. If $n_p^g$ is the number of mutations that can take $g$ to a $p$ . Then, $\tau_e/n_p^g = \frac{1}{\left(\frac{n_p^g}{(K-1)L}\right)NL \mu}$ is the time-scale to get a $p$ mutant from $g$ . This is because, $\left(\frac{n_p^g}{(K-1)L}\right)$ is the probability that {a mutation from $g$ leads to $p$ }. The mean will be of this same order (and I think equal actually). Therefore the time to {first get {a mutation from $g$ leads to $p$ }}}, $\tau_1$ , is distributed according to $Q(t_1) = \mathcal{N} e^{n_p^g t_1/ \tau_e}$ , where $\mathcal{N}$ is a normalization constant. Therefore, the {probability to get {a mutation from $g$ leads to $p$ }} in the a time $\tau$ (the time between two consecutive fixations)} is $\int_{t_1=0}^{t_1=\tau} Q(t_1) dt_1 = 1 - e^{n_p^g \tau/ \tau_e}$ . Integrating over the distribution of $\tau$ , we have the probability $P(n_g^p)$ that phenotype $p$ is discovered before the next neutral fixation:

$\xi =\frac{\tau_f}{\tau_e} = \frac{N}{(K-1)L\rho} \approx \frac{N}{L}$

For $\xi \gg 1$ (large population limit):

If $n_p^g \neq 0$ , $P(n_g^p) \approx 1$
If $n_p^g = 0$ , $P(0) \approx 0$

We can apply a mean-field approximation to the monomorphic limit. Let $p(n_p^g)$ be the probability that a genotype $g$ in $\mathcal{N}_q$ has the given value of $n_p^g$ . Then $\bar{P}(n_g^p) \approx p(0) P(0) + p(1) P(1)$ , if we assume $p(n_p^g) \ll p(1)$ for $n_p^g > 1$ . Then $\bar{P}(n_g^p) \approx (p(0) \cdot 0 + p(1) \cdot 1) P(1) \approx \bar{n}_{pq} P(1)$ , where $\bar{n}_{pq}$ is the average of $n_p^g$ .

For $\xi \ll 1$ (large genome limit), $P(n_g^p) \approx n_g^p \xi$ . In particular, $P(1) \approx \xi$ . Then $\bar{P}(n_g^p) = \bar{n}_g^p \xi = \bar{n}_{pq} P(1)$ .

Finally, $P(n_g^p)$ is {the probability that phenotype $p$ is discovered before the next neutral fixation}, i.e. the probability that the {number of times {[phenotype $p$ ] appears} before the next neutral fixation} is greater than $0$ , which is approximately the same as {the average number of times [it] appears}, if {{the probability that {[it] appears in one generation}} is small}, which is the case as {in the monomorphic limit, mutants are rare, $N L \mu \ll 1$ }. Then, $\bar{P}(n_g^p)$ is the average of this quantity, which we use in the mean-field approximation.

Then, following the same derivation as in Polymorphic limit (Wright-Fisher model), we have

$T(\alpha) = \frac{\tau_f log(1-\alpha)}{\bar{P}(n_g^p)} = \frac{\tau_f log(1-\alpha)}{\bar{n}_{pq} P(1)}$

where $\tau_f$ is the (mean) duration of each "step" (corresponding to going from being fixated to one genotype to another). Now, {the average number of mutations from a genotype in $\mathcal{N}_q$ leading to phenotype $p$ } can be expressed as $\bar{n}_{pq} = (K-1)L\phi_{pq}$ , as $\phi_{pq}$ is the mean probability that {a single-point mutation from a genotype in $\mathcal{N}_q$ leads to phenotype $p$ }, and $(K-1)L$ is the number of single-point mutations. Now, we can find $T(\alpha)$ at the two limits of interest: