Characterizing genetic basis of complex diseases by integrating data-bridge and genomics

Date
2023-06
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
With the advancement of high-throughput sequencing and genotyping technology, many multi-omics data are generated in the genomic projects. Such multi-omics data are in between of genotype and phenotype, therefore, may serve as data-bridges to help statistical genetic analyses. How to effectively integrate such data-bridges brings challenges and opportunities for statistical geneticists. For instances, the problem of statistical overfitting, the question of seamlessly integrating biological priors with high-dimensional data, and the interpretation of statistical results in the context of biology. The works in this thesis focus on integrating such data-bridges to characterize the genetic basis of complex diseases and addressing the aforementioned challenges. I have developed novel statistical models of analyzing multi-omics data from four perspectives: (Q1) How to integrate biological priors such as transcription factors with statistical models; (Q2) How to utilize trans- regulatory variants while keeping the model robust despite the large number of possible candidates; (Q3) How to utilize data-bridges to improve the modeling of rare genetic variants; and (Q4) How to utilize brain imaging data in genetic association mapping. These efforts led to four novel statistical models and their implementation: namely, (M1) sTF-TWAS, which integrates the prior knowledge of transcription factors (TF) with association study; (M2) transTF-TWAS, which utilizes Group Lasso to incorporate TF-linked trans-located variants; (M3) rvTWAS, which leverages transcriptome-directed feature selection towards rare variants; and (M4) IMAS, which uses borrowed brain images to conduct image-directed feature selection and aggregations. All these four methods are verified by comprehensive simulations based on known genetic architectures and heritability models. Utilizing the large-scale omics data accessed through dbGaP and UK Biobank, as well as the large cohorts from our collaborator, I have applied them to cancers and neuropsychiatric disorders, yielding the discovery of additional genes underlying complex traits. I have also thoroughly validated the methods by analyzing the discoveries using existing biological literature and databases. The development of these methods opens a door for integrating data-bridges such as transcriptomes and imaging data in genetic mapping. The novel findings provide additional insights into the genetic basis of cancers and brain disorders.
Description
Keywords
Citation
He, J. (2023). Characterizing genetic basis of complex diseases by integrating data-bridge and genomics (Doctoral thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.