Mean field approximation to average number of phenotypes discovered in Wright-Fisher model

guillefix 4th November 2016 at 2:43pm

(See Arrival of the frequent for context)

See also Wright-Fisher model

The Hamming distance (i.e. the number of differing letters, or mutations) dd is then distributed binomially:

h(d)=(Ld)μd(1μ)Ldh(d) = \binom{L}{d} \mu^{d} (1-\mu)^{L-d}

The expected number of individuals with genotype pp that arises at generation tt can be written as:

mp(t)=iNd=1Lh(d)Φp(gi,si,d)=iNΦp~(gi,si)m_p (t) = \sum_i^N \sum_{d=1}^L h(d) \Phi_p (g_i, s_i, d) = \sum_i^N \tilde{\Phi_p} (g_i, s_i)Eq.1

where Φd(gi,si,d)\Phi_d (g_i, s_i, d) is the probability that a dd-fold mutation of genotype gig_i (selected for reproduction according to fitness 1+si1+s_i) generates an individual with phenotype pp. It takes into account the genotype-phenotype map. gig_i is the genotype of the iith member of the population, with a total of NN members. See derivation of this below:

As the number is distributed binomially, the average number is mp=N(probability for single offspring to get phenotype p)m_p = N(\text{probability for single offspring to get phenotype p}). Then we define Φp~(gi,si)=(the probability for the single offspring to get to phenotype p \tilde{\Phi_p}(g_i, s_i) = (\text{the probability for the single offspring to get to phenotype p} given it inherits a mutated version of parent i)\text{given it inherits a mutated version of parent i}). Furthermore, (probability for single offspring to get phenotype p)(\text{probability for single offspring to get phenotype p}) = i=1N(probability of single offspring to get phenotype p through parent i) \sum_{i=1}^N (\text{probability of single offspring to get phenotype p through parent } i) = i=1NΦp~(gi,si)×(probability to inherit from parent i)\sum_{i=1}^N \tilde{\Phi_p}(g_i, s_i) \times (\text{probability to inherit from parent } i) = i=1NΦp~(gi,si)(1+si)j=1N(1+sj)\sum_{i=1}^N \tilde{\Phi_p}(g_i, s_i) \frac{(1+s_i)}{\sum_{j=1}^N (1+s_j)}. Finally,

mp=N(probability for single offspring to get phenotype p)m_p = N(\text{probability for single offspring to get phenotype p}) = i=1NΦp~(gi,si)N(1+si)j=1N(1+sj)\sum_{i=1}^N \tilde{\Phi_p}(g_i, s_i) \frac{N(1+s_i)}{\sum_{j=1}^N (1+s_j)} i=1NΦp(gi,si)\equiv \sum_{i=1}^N \Phi_p'(g_i, s_i)

By fine-graining the transitions from gig_i to a phenotype-pp genotype into transitions with particular mutation numbers dd, we can write Φp(gi,si)d=1LΦp(gi,si,d)\Phi_p'(g_i, s_i) \equiv \sum_{d=1}^L \Phi_p (g_i, s_i, d), recovering Eq. 1

[#[manual links]] (try to upgrade TW to make this work)

The actual number of individuals with genotype pp will follow a binomial distribution (as explained for a simple case in Wright-Fisher model), with probability mp(t)/Nm_p(t)/N, and number of trials NN. The probability of none of the offspring having phenotype pp is: (1mp(t)/N)Nemp(t)(1-m_p(t)/N)^N \approx e^{-m_p(t)}, the approximation holds for large NN, and may be seen as approximating the Binomial distribution by a Poisson distribution.

If we assume that Ld1Ld \ll 1, i.e. the average number of mutations per genotype is very small, then h(d)h(1)h(d) \ll h(1) for all d>1d>1, and h(1)Lμh(1) \approx L \mu (h(0)1h(0) \approx 1 while h(0)<1h(0) < 1 of course).

With the above assumption that Ld1Ld \ll 1, Φp(gi,si)=d=1Lh(d)Φp(gi,si,d)Φp(gi,si,0)+Φp(gi,si,1)Lμ\Phi_p'(g_i, s_i) = \sum_{d=1}^L h(d) \Phi_p (g_i, s_i, d) \approx \Phi_p (g_i, s_i, 0) + \Phi_p (g_i, s_i, 1) L\mu. Also, Φp(gi,si,0)=0\Phi_p (g_i, s_i, 0) = 0, if pqp \neq q. Next, if we assume, si=0s_i = 0, for all ii with gig_i mapping to phenotype qq (i.e. in space Nq\mathcal{N}_q), and that it all starts within Nq\mathcal{N}_q, we have

mp(t)=i=1NΦp(gi,si)i=1NΦp(gi,0,1)Lμm_p(t) = \sum_{i=1}^N \Phi_p'(g_i, s_i) \approx \sum_{i=1}^N \Phi_p (g_i, 0, 1) L\muEq.2

We can also define the averaged {expected number of offspring with phenotype pp at one generation, which inherited from genotype gig_i at the previous generation via a single mutation}, i.e the average of Φp(gi,0,1)\Phi_p(g_i, 0, 1), over all gig_i in Nq\mathcal{N}_q. We will write abuse notation, and use the label ii in gig_i to label a genotype in Nq\mathcal{N}_q, so that i=1,2,...Nqi = 1, 2, ... N_q. The average is then:

Φpq=1Nqi=1NqΦp(gi,0,1)\Phi_{pq} = \frac{1}{N_q}\sum_{i=1}^{N_q} \Phi_p(g_i, 0, 1)

Furtheremore, we should note that, as Φp(gi,si)=Φp~(gi,si)N(1+si)j=1N(1+sj)\Phi_p'(g_i, s_i) = \tilde{\Phi_p}(g_i, s_i) \frac{N(1+s_i)}{\sum_{j=1}^N (1+s_j)} (and a similar expression for the dd dependent quantities). When si=0s_i = 0, we find Φp(gi,si)=Φp~(gi,si)\Phi_p'(g_i, s_i) = \tilde{\Phi_p}(g_i, s_i), and also, for example, that Φp(gi,0,1)=Φp~(gi,0,1)\Phi_p(g_i, 0, 1) = \tilde{\Phi_p}(g_i, 0, 1), where Φp~(gi,si,d)=(the probability for the single offspring to get to phenotype p\tilde{\Phi_p}(g_i, s_i, d) = (\text{the probability for the single offspring to get to phenotype p} given it inherits a mutated version of parent i, via a single-point mutation (d=1))\text{given it inherits a mutated version of parent i, via a single-point mutation (d=1)}). Thus Φpq\Phi_{pq} is the average of this probability.

We also define the robustness of phenotype qq, ρ\rho as equal to the average probability over all Nq\mathcal{N}_q of a neutral mutation (i.e. one from Nq\mathcal{N}_q to Nq\mathcal{N}_q). Under the approximate assumptions above, Φqqρ\Phi_{qq} \approx \rho. If we assume also that the population is large enough (more precisely, we are in the Polymorphic limit (Wright-Fisher model)), we can use a mean field approximation: approximate Φp(gi,0,1) \Phi_p (g_i, 0, 1) by Φpq\Phi_{pq}. This approximate works best if the population is large enough that most of the neutral space Nq\mathcal{N}_q is populated (or in the author of the paper word's "1-mutant neighbourhood of the population is similar to that of the whole neutral space"). Using this in Eq.2:

mp(t)Lμi=1NΦpq=NLμΦpqm_p(t) \approx L\mu \sum_{i=1}^N \Phi_{pq} = N L\mu \Phi_{pq}Eq.3