Detectability of Non-Equilibrium Molecular Evolution Caused by Fitness Shift and Drift

Abstract
One of the key interests of computational molecular evolution is the inference of the strength and direction of natural selection in protein-coding genes. The non-synonymous to synonymous rate ratio (dN/dS) is widely used to evaluate the effect of natural selection on genes, lineages, and sites. When dN/dS is inferred to be greater than one along a particular branch and at a specific site, this is often taken as evidence of episodic positive selection and adaptive change in function. Despite the simplicity and widespread use of dN /dS approaches, they are funda- mentally unable to differentiate between fit and unfit states, and the stationary distributions in all widely-used approaches are (unrealistically) identical across sites. To address these short- comings, the mutation-selection framework, which is a class of codon substitution models that allows a mechanistic relationship between fitness and sequence has been proposed. Recently, due to developments in Markov-Chain Monte Carlo (MCMC) methods and penalized maximum likelihood approaches, computationally tractable models have been implemented that enable in- ference under site-heterogeneous mutation-selection models, though substantial computational barriers to using such methods on large datasets persist.Here, in my thesis, I introduce time-heterogeneous mutation-selection models as an ideal representation of how episodic adaptation occurs. Using these models, I study how true dN /dS changes over time following a wide variety of fitness shifts (when the fitness profile at a site is completely replaced with a new fitness profile) and fitness drift scenarios (when the fitness of the two most favorable states is swapped). Both simulation and direct (simulation-free) analysis are used to characterize non-equilibrium molecular evolution under time-heterogeneous mutation- selection models of codon substitution. Additionally, I evaluate the performance of existing branch-site type methods to distinguish fitness shift from a relaxation of constraints at a small number of sites. In general, I find that the more different the starting and ending fitness profiles are, the more reliably an adaptive burst is produced, which is potentially detectable using dN /dS approaches. Although all existing methods we considered in the simulation performed poorly and have very low power to detect fitness shifts, I find that covariate information that helps inform which sites might be targets of positive selection can rescue high power of dN/dS type methods to detect modest to strong fitness shifts.Our desire in this project has been to improve our understanding of non-equilibrium molecular evolution under mechanistic models of adaptive change in function and to illuminate how well relatively simple statistical approaches perform in inference tasks. I hope this body of work will broaden the horizon for more realistic, mechanistic, and tractable models of non-equilibrium molecular evolution.
Description
Keywords
Molecular evolution, dN/dS, Positive selection, Burst of amino acid substitutions, Branch-site methods, Power estimation, Mutation-selection model, Phylogenetic methods, Evolutionary modeling, Protein evolution
Citation
Kazemi Mehrabadi, M. (2022). Detectability of Non-Equilibrium Molecular Evolution Caused by Fitness Shift and Drift (Master's thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.