Improved Basecalling and Base Modification Detection Through Signal-level Analysis of Nanopore Direct RNA Data

dc.contributor.advisorLong, Quan
dc.contributor.advisorGordon, Paul
dc.contributor.authorWang, Scott
dc.contributor.committeememberSmith, Mike
dc.contributor.committeememberAnderson, David
dc.date2023-11
dc.date.accessioned2023-09-22T15:05:43Z
dc.date.available2023-09-22T15:05:43Z
dc.date.issued2023-09-14
dc.description.abstractGenome sequencing technologies emerged as an essential tool for addressing challenges presented by the natural biological complexity of organisms. Unlike traditionally used next-generation sequencing (NGS) methods, which yield short reads, Third-generation sequencing (TGS) methods can sequence transcripts and complete genomes in single contiguous sequencing reads, providing innovative means to address practical topics surrounding viral transmission, evolution, and pathogenesis. TGS alleviates the computational challenges of consensus genome assembly or transcript construction from fragmented reads as required with building NGS libraries. Despite these advantages, as an emerging technology, TGS faces many technical challenges. High error rates make it difficult to distinguish machine errors from low frequency mutations in the genome. Some of the most well known and pervasive diseases in society originate from viruses with ribonucleic acid (RNA) genomes; these include but are not limited to Influenza and Coronaviruses. Advancement towards a comprehensive understanding of RNA viruses has been hindered by their unique biology and high levels of diversity, along with quick replication and mutation rates, which leads to important viral evolutionary signals in individual viral copies. Some of the high basecalling error rate in TGS can be attributed to the presence of unmodeled signal, e.g. calling just the four canonical nucleobases (A, C, G, T/U) when methylation along with other nucleobase modifications are also contributing to the signal. Being able to accurately identify (i.e. signal model) the location of such nucleobase modifications would naturally lead to better nucleobase calling and provide insights into RNA virus biology. The few extant tools in this area for TGS are based on deep-learning AI methods due to computational tractability, and are demonstrably biased. In contrast to such opaque methods, in this work, new efficient implementations of theoretically optimal (“dynamic programming”) methods for Oxford Nanopore Technologies (ONT) TGS raw signal segmentation, alignment, clustering, and consensus are deployed. With follow-on statistical analyses of signal deviations within those results, this defines a minimally biased, statistically grounded procedure for detecting unmodeled signal (i.e. putative nucleobase modifications or mutations), as demonstrated using multiple publicly available raw ONT direct RNA sequencing viral datasets.
dc.identifier.citationWang, S. (2023). Improved basecalling and base modification detection through signal-level analysis of nanopore direct RNA data (Master's thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.
dc.identifier.urihttps://hdl.handle.net/1880/117102
dc.language.isoen
dc.publisher.facultyGraduate Studies
dc.publisher.institutionUniversity of Calgary
dc.rightsUniversity of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.
dc.subjectNanopore
dc.subjectBasecalling
dc.subjectSARS-CoV-2
dc.subjectRNA virus
dc.subject.classificationBioinformatics
dc.titleImproved Basecalling and Base Modification Detection Through Signal-level Analysis of Nanopore Direct RNA Data
dc.typemaster thesis
thesis.degree.disciplineMedicine – Biochemistry and Molecular Biology
thesis.degree.grantorUniversity of Calgary
thesis.degree.nameMaster of Science (MSc)
ucalgary.thesis.accesssetbystudentI do not require a thesis withhold – my thesis will have open access and can be viewed and downloaded publicly as soon as possible.
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ucalgary_2023_wang_scott.pdf
Size:
5.17 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.62 KB
Format:
Item-specific license agreed upon to submission
Description: