Monomorphic limit (Wright-Fisher model)

guillefix 4th November 2016 at 2:43pm

(See context at the Arrival of the frequent).

Neutral spaces can be astronomically large, much bigger than even the largest viral or bacterial populations (see this paper). In that case, the local neighborhood of the population may not be fully representative of the neighborhood of the entire space.

This scenario can most easily understood in the monomorphic limit: when mutants are rare, NLμ1NL \mu \ll 1

Now, the (average) rate of neutral mutations (per individual) is ν=Lμρ\nu = L \mu \rho, as ρ\rho is the probability that a mutation is neutral.

See more in the Monomorphic limit (Wright-Fisher model) tiddler.

Furthermore, Kimura showed two things relating to fixation (see Population genetics):

  • Probability of fixation. In a population following the Wright-Fisher model in a neutral space (no relative fitnessses), with no mutations, a single allele will eventually fix, and the probability for a particular allele to be the one that fixes is equal to its initial frequency. See the derivation here or here (page 15). For the generalization to non-neutral space see here (page 201). See here too.
  • Mean fixation time. In a population following the Wright-Fisher model in a neutral space (no relative fitnessses), with no mutations, the average time that it takes for a particular allele to fix, given that it fixes, is τ¯(p)=4N(1pp)ln(1p)\bar \tau(p)=-4N\left(\frac{1-p}{p}\right)\ln(1-p), where pp is the initial frequency of the allele. For pp small (0\rightarrow 0), ln(1p)p\ln(1-p) \rightarrow -p, and 1pp1p\frac{1-p}{p} \rightarrow \frac{1}{p}, and so τ¯(p)4N\bar \tau(p) \rightarrow 4N. See this SX question, the original paper by Kimura, here. For another way of deriving it, for the related Moran model see here (page 57).

Now, when mutations are rare enough (that the same mutation occurring twice simultaneously is very unlikely), a mutation will initially just have a frequency p=1/Np = 1/N. This fact, combined with the above results imply two things:

  • The rate of fixations is equal to the rate of (neutral) mutations of an individual. The average rate of mutations in the total population is Nν=NLμρN\nu = N L \mu \rho. As their initial frequency is p=1/Np = 1/N, then they have a probability of fixation 1/N1/N. Then {the rate of {mutations that fix}} is rate at which they appear×probability they fix\text{rate at which they appear} \times \text{probability they fix} == NLμρ(1/N)=LμρN L \mu \rho (1/N) = L \mu \rho, where ρ\rho is probability that a mutation is neutral (otherwise it can't fix as we assume non-neutral mutants have 00 fitness).
  • {The mean fixation time of {a mutation that fixes}} is much smaller than {the mean time to get {a mutation that fixes}}, which we write mathematically as, $$τ¯(p)τm\bar \tau(p) \ll \tau_m. If NN is large, p=1/N1p = 1/N \ll 1, and so τ¯(p)4N\bar \tau(p) \approx 4N. Also the time scale of getting {a mutation that fixes} ({the mean time to get {a mutation that fixes}} would be of the same order, of course) is 1/rate=1/(Lμρ)1/\text{rate} = 1/(L \mu \rho). Their ratio is τ¯(p)τm=4NLμρ1\frac{\bar \tau(p)}{\tau_m} = 4N L \mu \rho \ll 1, by the defining assumptions of the monomorphic limit, and noting that ρ\rho, being a probability is <1<1. See here or here

The second point means that we are in a situation where the population fixes to a particular genotype in Nq\mathcal{N}_q, in the relatively fast time-scale 4N4N, and stays there during the much longer time 1/(Lμρ)1/(L \mu \rho), before it fixes to a new genotype.


+++++++(...)++++++

  • Large population limit
  • Large genome limit

Short term correlations refer to: p-type individuals are being sampled from the same set (the set of p-types in the 1-neighbourhood of the currently fixed q-type genotype which most of the population has) throughout the time that the population is fixed to a particular genotype. When the population (relativelt quickly) transfers to a new genotype, the p-types produced are now sampled from a new set, but still all of them from the same set. The fact that they are sampled from the same set in inter-refixation times (tau_f), means they have correlations that last tau_f in average ("short-term")

If fixations occur much before the set of p-types in 1-neighbourhood is explored, these correlations are no longer observed.

As our evolutionary process is a Markov process, the first discovery time of a neighbour genotype as well as the arrival time of the neutral mutant ‘‘destined’’ to be fixed, are distributed geometrically (or exponentially in a model with continuous time). Thus the mean of these times are equal to the respective standard deviation, and we have large fluctuations.

The geometrical distribution comes about because the Markov property implies that one can define a probability for each of the two events above ({discovery of all neighbour genotype}, and {arrival of the neutral mutant ‘‘destined’’ to be fixed}), and then, each generation corresponds to a Bernoulli trial, and first arrival times follow a geometric distribution. For example, the probability of {arrival of the neutral mutant ‘‘destined’’ to be fixed}, is approximately LμρL \mu \rho (valid when Lμρ1L \mu \rho \ll 1, which we assume. This ultimately comes from the fact that {when the probability of an event is small the average number of times it occurs on a set of trials, is approximately the same as {the probability of it occurring any number of times}}. Essentially 1(1p)N=Np1 - (1 - p)^N \approx = Np when p1p \ll 1. See Probability theory too).

The continuous time approximation: the mean {generation of first success, kk} is fixed to k¯=1/p\bar{k}= 1/p (where pp is the prob. of success in Bernoulli trial). We rescale the time variable as τ=k/N\tau = k/N, and the mean is τf=1/(pN)\tau_f = 1/(pN), where NN is the reciprocal of the time step (i.e. the time we define that a generation lasts). The geometric distribution becomes limNp(1p)k1=1τfN(11τfN)τN1=eτ/τf\lim_{N \rightarrow \infty} p(1-p)^{k-1} = \frac{1}{\tau_f N}(1-\frac{1}{\tau_f N})^{\tau N-1} = e^{-\tau / \tau_f}.

Now, τe=(K1)L/(NLμ)\tau_e = (K-1)L/(NL \mu) is the time scale to find all the 1-neighbour genotypes. If npgn_p^g is the number of mutations that can take gg to a pp. Then, τe/npg=1(npg(K1)L)NLμ\tau_e/n_p^g = \frac{1}{\left(\frac{n_p^g}{(K-1)L}\right)NL \mu} is the time-scale to get a pp mutant from gg. This is because, (npg(K1)L)\left(\frac{n_p^g}{(K-1)L}\right) is the probability that {a mutation from gg leads to pp}. The mean will be of this same order (and I think equal actually). Therefore the time to {first get {a mutation from gg leads to pp}}}, τ1\tau_1, is distributed according to Q(t1)=Nenpgt1/τeQ(t_1) = \mathcal{N} e^{n_p^g t_1/ \tau_e} , where N\mathcal{N} is a normalization constant. Therefore, the {probability to get {a mutation from gg leads to pp}} in the a time τ\tau (the time between two consecutive fixations)} is t1=0t1=τQ(t1)dt1=1enpgτ/τe\int_{t_1=0}^{t_1=\tau} Q(t_1) dt_1 = 1 - e^{n_p^g \tau/ \tau_e}. Integrating over the distribution of τ\tau, we have the probability P(ngp)P(n_g^p) that phenotype pp is discovered before the next neutral fixation:

ξ=τfτe=N(K1)LρNL\xi =\frac{\tau_f}{\tau_e} = \frac{N}{(K-1)L\rho} \approx \frac{N}{L}

For ξ1\xi \gg 1 (large population limit):

  • If npg0n_p^g \neq 0 , P(ngp)1P(n_g^p) \approx 1
  • If npg=0n_p^g = 0 , P(0)0P(0) \approx 0

We can apply a mean-field approximation to the monomorphic limit. Let p(npg)p(n_p^g) be the probability that a genotype gg in Nq\mathcal{N}_q has the given value of npgn_p^g. Then P¯(ngp)p(0)P(0)+p(1)P(1)\bar{P}(n_g^p) \approx p(0) P(0) + p(1) P(1), if we assume p(npg)p(1)p(n_p^g) \ll p(1) for npg>1n_p^g > 1. Then P¯(ngp)(p(0)0+p(1)1)P(1)n¯pqP(1)\bar{P}(n_g^p) \approx (p(0) \cdot 0 + p(1) \cdot 1) P(1) \approx \bar{n}_{pq} P(1), where n¯pq \bar{n}_{pq} is the average of npgn_p^g.

For ξ1\xi \ll 1 (large genome limit), P(ngp)ngpξP(n_g^p) \approx n_g^p \xi . In particular, P(1)ξP(1) \approx \xi. Then P¯(ngp)=n¯gpξ=n¯pqP(1)\bar{P}(n_g^p) = \bar{n}_g^p \xi = \bar{n}_{pq} P(1).

Finally, P(ngp)P(n_g^p) is {the probability that phenotype pp is discovered before the next neutral fixation}, i.e. the probability that the {number of times {[phenotype pp] appears} before the next neutral fixation} is greater than 00, which is approximately the same as {the average number of times [it] appears}, if {{the probability that {[it] appears in one generation}} is small}, which is the case as {in the monomorphic limit, mutants are rare, NLμ1N L \mu \ll 1}. Then, P¯(ngp)\bar{P}(n_g^p) is the average of this quantity, which we use in the mean-field approximation.

Then, following the same derivation as in Polymorphic limit (Wright-Fisher model), we have

T(α)=τflog(1α)P¯(ngp)=τflog(1α)n¯pqP(1)T(\alpha) = \frac{\tau_f log(1-\alpha)}{\bar{P}(n_g^p)} = \frac{\tau_f log(1-\alpha)}{\bar{n}_{pq} P(1)}

where τf\tau_f is the (mean) duration of each "step" (corresponding to going from being fixated to one genotype to another). Now, {the average number of mutations from a genotype in Nq\mathcal{N}_q leading to phenotype pp} can be expressed as n¯pq=(K1)Lϕpq \bar{n}_{pq} = (K-1)L\phi_{pq}, as ϕpq\phi_{pq} is the mean probability that {a single-point mutation from a genotype in Nq\mathcal{N}_q leads to phenotype pp}, and (K1)L(K-1)L is the number of single-point mutations. Now, we can find T(α)T(\alpha) at the two limits of interest: