Parallelization of Bayesian Phylogenetics to Greatly Improve Run Times
dc.contributor.advisor | Zhang, Qingrun | |
dc.contributor.advisor | Gordon, Paul | |
dc.contributor.author | Yang, David | |
dc.contributor.committeemember | Liao, Wenyuan | |
dc.contributor.committeemember | van der Meer, Franciscus Johannes | |
dc.date | 2024-05 | |
dc.date.accessioned | 2024-03-27T16:29:32Z | |
dc.date.available | 2024-03-27T16:29:32Z | |
dc.date.issued | 2024-03-24 | |
dc.description.abstract | Phylogenetic analyses are invaluable to understanding the transmission of viruses, especially during disease outbreaks. In particular, Bayesian phylogenetics has great potential in modeling viral transmission due to the numerous phylogenetic models that can be incorporated. Currently, the availability of user-friendly software and accessibility to sequence data makes phylogenetic analyses easy to perform. However, to date, Bayesian phylogenetic analyses are still limited by long computational run-times which are especially unfavorable during ongoing and evolving disease outbreaks that demand real-time phylogeny results. Current optimization methods of Bayesian phylogenetic analysis mainly focus on iteration-level parallelization and mostly overlook the potential of larger-scale parallelization approaches. In this thesis, we provide an in-depth overview of topics including phylogenetic analysis, relevant biological information, and phylogenetic analysis optimization methods. We also proposed a novel parallelized Markov Chain Monte Carlo method that greatly improved Bayesian phylogenetic run times and integrated the approach into a data pipeline to allow for the direct analysis of viral samples. We demonstrated the validity of our methods by performing phylogenetic analyses on two sets of HIV simulation data and one set of real-world SARS-CoV-2 data. Our results suggested that the parallelization of MCMC in Bayesian phylogenetic analyses drastically reduces run times by 29-fold without causing significant deviations in parameter estimates and predicted phylogenetic trees. | |
dc.identifier.citation | Yang, D. (2024). Parallelization of Bayesian phylogenetics to greatly improve run times (Master's thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca. | |
dc.identifier.uri | https://hdl.handle.net/1880/118331 | |
dc.identifier.uri | https://doi.org/10.11575/PRISM/43174 | |
dc.language.iso | en | |
dc.publisher.faculty | Graduate Studies | |
dc.publisher.institution | University of Calgary | |
dc.rights | University of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission. | |
dc.subject | Markov Chain Monte Carlo | |
dc.subject | SARS-CoV-2 | |
dc.subject | Simulation | |
dc.subject | Bayesian | |
dc.subject | Phylogenetic analysis | |
dc.subject | Parallelization | |
dc.subject.classification | Biostatistics | |
dc.subject.classification | Statistics | |
dc.subject.classification | Epidemiology | |
dc.subject.classification | Bioinformatics | |
dc.title | Parallelization of Bayesian Phylogenetics to Greatly Improve Run Times | |
dc.type | master thesis | |
thesis.degree.discipline | Mathematics & Statistics | |
thesis.degree.grantor | University of Calgary | |
thesis.degree.name | Master of Science (MSc) | |
ucalgary.thesis.accesssetbystudent | I do not require a thesis withhold – my thesis will have open access and can be viewed and downloaded publicly as soon as possible. |