Parallelization of Bayesian Phylogenetics to Greatly Improve Run Times

dc.contributor.advisorZhang, Qingrun
dc.contributor.advisorGordon, Paul
dc.contributor.authorYang, David
dc.contributor.committeememberLiao, Wenyuan
dc.contributor.committeemembervan der Meer, Franciscus Johannes
dc.date2024-05
dc.date.accessioned2024-03-27T16:29:32Z
dc.date.available2024-03-27T16:29:32Z
dc.date.issued2024-03-24
dc.description.abstractPhylogenetic analyses are invaluable to understanding the transmission of viruses, especially during disease outbreaks. In particular, Bayesian phylogenetics has great potential in modeling viral transmission due to the numerous phylogenetic models that can be incorporated. Currently, the availability of user-friendly software and accessibility to sequence data makes phylogenetic analyses easy to perform. However, to date, Bayesian phylogenetic analyses are still limited by long computational run-times which are especially unfavorable during ongoing and evolving disease outbreaks that demand real-time phylogeny results. Current optimization methods of Bayesian phylogenetic analysis mainly focus on iteration-level parallelization and mostly overlook the potential of larger-scale parallelization approaches. In this thesis, we provide an in-depth overview of topics including phylogenetic analysis, relevant biological information, and phylogenetic analysis optimization methods. We also proposed a novel parallelized Markov Chain Monte Carlo method that greatly improved Bayesian phylogenetic run times and integrated the approach into a data pipeline to allow for the direct analysis of viral samples. We demonstrated the validity of our methods by performing phylogenetic analyses on two sets of HIV simulation data and one set of real-world SARS-CoV-2 data. Our results suggested that the parallelization of MCMC in Bayesian phylogenetic analyses drastically reduces run times by 29-fold without causing significant deviations in parameter estimates and predicted phylogenetic trees.
dc.identifier.citationYang, D. (2024). Parallelization of Bayesian phylogenetics to greatly improve run times (Master's thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.
dc.identifier.urihttps://hdl.handle.net/1880/118331
dc.language.isoen
dc.publisher.facultyGraduate Studies
dc.publisher.institutionUniversity of Calgary
dc.rightsUniversity of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.
dc.subjectMarkov Chain Monte Carlo
dc.subjectSARS-CoV-2
dc.subjectSimulation
dc.subjectBayesian
dc.subjectPhylogenetic analysis
dc.subjectParallelization
dc.subject.classificationBiostatistics
dc.subject.classificationStatistics
dc.subject.classificationEpidemiology
dc.subject.classificationBioinformatics
dc.titleParallelization of Bayesian Phylogenetics to Greatly Improve Run Times
dc.typemaster thesis
thesis.degree.disciplineMathematics & Statistics
thesis.degree.grantorUniversity of Calgary
thesis.degree.nameMaster of Science (MSc)
ucalgary.thesis.accesssetbystudentI do not require a thesis withhold – my thesis will have open access and can be viewed and downloaded publicly as soon as possible.
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ucalgary_2024_yang_david.pdf
Size:
6.55 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.62 KB
Format:
Item-specific license agreed upon to submission
Description: