The structure of the genotype-phenotype map strongly constraints the evolution of non-coding RNA

cosmos 11th December 2017 at 6:44pm
Genotype-phenotype map

See MMathPhys oral presentation

The structure of the genotype–phenotype map strongly constrains the evolution of non-coding RNA

Non-coding RNA (ncRNA) is RNA whose function is not to encode information. It's function may then be structural, or catalytic for instance, and is most often determined by its secondary structure, which is then the phenotype of interest.

The distribution of properties found in ncRNA in nature (from fRNAdb database) closely follows that obtained by G-sampling (uniform sampling over genotypes). Due to the bias in the GP map, this sampling is very different from P-sampling (uniform sampling over phenotypes). The strong bias makes certain structures appear much more often, which has been called convergent evolution in Evolution (part of the general phenomenon of homoplasy). An example is the ubiquity of the hammerhead ribozyme through all the kingdoms of life.

Figure 2. Comparison of P-sampled and G-sampled distributions to natural data for L = 20 RNA. The P-sampled PP(Ω) (red diamonds) measures the probability distribution for a phenotype to have a given NS size Ω. It differs markedly from G-sampled PG(Ω) (blue circles), generated by random sampling over genotypes. Error bars arise from binning data. The black and cyan lines are theoretical approximations to PP(Ω) and PG(Ω), respectively (see Methods). The probability distribution of Ω for the SSs all 7327 (non-trivial) L = 20 sequences for Drosophila melanogaster from the fRNAdb database [21] (green squares) is much closer to the G-sampled PG(Ω) than to the P-sampled PP(Ω). Inset: all 11 218 SS phenotypes (purple triangles) ranked by NS size Ω. There is strong bias, just 5% of phenotypes take up 58% of all genotypes. The 7327 natural data points (green squares) are clustered at lower rank (larger Ω). (Online version in colour.)

The number of 'relevant structures' can be estimated by the entropy HH of the G-sampled distribution of features (for instance belonging to a certain binned interval of neutral space size, or number of stacks (sets of contiguous base-pairs)), as 2H2^H. One can define the bias ratio as the ratio of 2H2^H to the total number of phenotypes.

Within these relevant structures which arrive during evolution, natural selection still acts, and can be seen for example in the higher stability of natural RNAs vs random G-sampled RNAs. We find that the natural RNAs have slightly more bonds than in G-sampled structures. The bias towards larger Ω\Omega also leads to structures with larger mutational robustness (see Robustness and Evolvability in Living Systems and From sequences to shapes and back: a case study in RNA secondary structures). Larger robustness is considered to be advantageous [6], so that, in this important way, phenotype bias facilitates evolution. The high robustness, however is found both in G and P-sampling because of the high genetic correlations (genes tend to be close in the mutational network to other genes that produce the same phenotype). The genetic correlations are high enough to produce giant connected components (see Natural Selection and the Concept of a Protein Space).

"Bias means that it will be difficult for evolution to find L ¼ 55 structures with a large number of stacks, again raising the question of what kind of functionality is possible in principle that cannot be reached by evolution because of such phenotype bias constraints?"

Understanding tip: The line in figure 4 is flat when there are a lot of phenotypes because there are a lot of phenotypes with the same Ω\Omega, and the phenotypes are equally spaced in xx axis in rank plot.

The results that G-sampling produce the same results as the database indicate that some property similar to ergodicity may be at play. G-sampling is an ensemble average, and the database shows a kind of time-average over evolutionary trajectories. However, the process cannot be totally ergodic because evolution is a nonequilibrium process, and effects like long waiting times and the Arrival of the frequent are examples of non-ergodic non-equilibrium effects.

The GP map bias is an example of biases in development or other internal processes could strongly affect evolutionary outcomes. These have been controversial; however, RNA SS provides perhaps the clearest and most unambiguous evidence for the importance of bias in shaping evolutionary outcomes.

See Homoplasy for discussion on the relation to convergent and parallel evolution. Our ability to make detailed predictions about evolutionary outcomes as well as counterfactuals for RNA may also shed light on Mayr’s famous distinction between proximate and ultimate causes in biology (See Cause and effect in biology and Proximate and ultimate causation). Not sure about this, or if I understand it..

The GP mapping constraint has some resemblance to classical morphogenetic constraints which also bias the arrival of variation [47]. But it also differs, because the latter are conceptualized at the level of phenotypes and developmental processes, and may have been shaped by prior selection, whereas the former constraint is a fundamental property of the mapping from genotypes to phenotypes and was not selected for (except perhaps at the origin of life itself Still, maybe most possible GP maps have this property anyways (see experiments with transducers)).

Finally, strong phenotype bias is also found in:

suggesting that some of the results discussed in this paper for RNA may hold more widely in biology

See also Evolving automata

Paper with several examples of GP maps, including cellular automata map: An investigation of redundant genotype-phenotype mappings and their role in evolutionary search

It would be interesting to devise artificial methods to search for such undiscovered ribozymes (those that are very improbable for evolution to find), some of which may be more fit than those that Nature has found.

For this see:

Exploring the repertoire of RNA secondary motifs using graph theory; implications for RNA design. tree graphs to describe RNA tree motifs and more general (dual) graphs to describe both RNA tree and pseudoknot motifs. our graph theory approach to RNA structures has implications for RNA genomics, structure analysis and design.

Experimental fitness landscapes to understand the molecular evolution of RNA-based life In evolutionary biology, the relationship between genotype and Darwinian fitness is known as a fitness landscape. These landscapes underlie natural selection, so understanding them would greatly improve quantitative prediction of evolutionary outcomes, guiding the development of synthetic living systems. However, the structure of fitness landscapes is essentially unknown. Our ability to experimentally probe these landscapes is physically limited by the number of different sequences that can be identified. This number has increased dramatically in the last several years, leading to qualitatively new investigations. Several approaches to illuminate fitness landscapes are possible, ranging from tight focus on a single peak to random speckling or even comprehensive coverage of an entire landscape. We discuss recent experimental studies of fitness landscapes, with a special focus on functional RNA, an important system for both synthetic cells and the origin of life.


Methods