Wasmuth, JamesLesack, Kyle James2023-08-102023-08-102023-08Lesack, K. J. (2023). Structural variation in the Caenorhabditis elegans genome: challenges and quality assurance strategies for reliable variant calling (Doctoral thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.https://hdl.handle.net/1880/116853https://dx.doi.org/10.11575/PRISM/41695Obtaining an accurate and comprehensive representation of structural variation is crucial for understanding how large alterations in chromosome structure contribute to phenotype diversity and drive genome evolution. Despite continuous efforts into improving methods for identifying structural variation from whole genome sequencing data, accurate variant calling remains challenging. The barriers to progress in this area are complex and multifactorial but the technical limitations of short-read sequencing technologies and limited availability of suitable benchmarking resources for non-human species feature prominently. This thesis includes an in-depth evaluation of several commonly used tools for identifying structural variants from short- and long-read DNA sequencing data from natural Caenorhabditis elegans strains. The results of these comparisons revealed that popular tools yield considerably different results, which are described in detail in Chapter 2. A major aim of this project was to identify sources of error and variability that tool developers could address in the future. Surprisingly, the order of reads in PacBio FASTQ files were revealed to affect the predicted structural variants. Chapter 3 describes these results and demonstrates how alignment sorting algorithms contribute to the problem. In Chapter 4, an analysis of structural variation in 14 natural C. elegans strains is described. Importantly, this work demonstrates how long-read DNA sequencing data can be successfully used to identify structural variants at the population level.enUniversity of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.structural variationCaenorhabditis elegansgenomicsbioinformaticsBioinformaticsStructural Variation in the Caenorhabditis elegans Genome: Challenges and Quality Assurance Strategies for Reliable Variant Callingdoctoral thesis