Identifying and explaining large-scale genome sequence convergence
Abstract
Recently it has been shown that convergent sequence evolution can happen in nature at unexpectedly large scales, systematically misleading methods of phylogenetic reconstruction. For this reason, among others, there has been growing interest in sequence convergence in recent years. Although various techniques for detecting sequence convergence, such as site-specific log-likelihood support and ancestral sequence reconstruction, have been used, there does not yet exist a general statistical procedure for reliably distinguishing between random convergence and convergence resulting from parallel selective pressures or time-heterogeneous evolutionary processes.
Here, I intend to further our understanding of sequence convergence by creating a new algorithm for detecting, quantifying and understanding non-random sequence convergence in a principled and unbiased manner. I design and implement a new approach for detecting convergence across entire phylogenies, making it amenable to a wider variety of datasets than was previously possible. Finally, I investigate the role of effective population size in contributing to sequence convergence, where I show for the first time that time-heterogeneity in effective population sizes can be sufficient to cause large-scale episodes of convergent sequence evolution. This surprising finding suggests an apparently non-adaptive mechanistic explanation since it can occur without changes to the underlying fitness landscape and is instead driven by lineages with increased effective population sizes becoming enabled to climb higher on the same adaptive peaks. As a result, we believe this phenomenon to be of adaptive significance even though it does not require adaptation to a changing environment per se. These finding suggest that time-heterogeneous evolutionary processes must be integrated into the models used for phylogenomic reconstruction and in comparative genomics more broadly.
Description
Keywords
Bioinformatics, Genetics
Citation
Bryans, N. (2017). Identifying and explaining large-scale genome sequence convergence (Master's thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca. doi:10.11575/PRISM/26425