Rapid Large-Scale Inference of Genome-Wide Mutational Heterogeniety

de Koning, A.P. JasonMathankeri, Aaron2016-10-172016-10-1720162016http://hdl.handle.net/11023/3432Tumours arise by mutation and natural selection among cellular lineages. Understanding and modelling mutation is thus a central aspect of cancer research. Genes that confer a selective advantage to their cell-line when mutated are known as drivers and are usually identified by statistical enrichment of mutations. Current approaches to detect drivers make several simplifying assumptions, sacrificing biological realism for computational speed when modelling mutation. The main novel, technical contribution of this thesis is the presentation of a principled mathematical framework for mutational analysis in genomic data that we term ``Mut-HMM''. Calculations required for large-scale inference were parallelized to take advantage of many-core CPU clusters. Based on this work, I present a new software package that can be orders of magnitude faster than previous state-of-the-art methods for analysis of genome-wide mutation patterns. I then present an exploratory analysis of chromosome 22 germline mutation data, showing that the results highlight the need for more complex and sophisticated mutation models in cancer and human genomics.engUniversity of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.BioinformaticsHidden Markov ModelsParallel ComputingContinuous Time Markov ChainsGenomicsLarge-Scale InferenceMutationRapid Large-Scale Inference of Genome-Wide Mutational Heterogenietymaster thesis10.11575/PRISM/27529