Cluster Aanlysis of Gene Expression Profiles via Flexible Count Models for RNA-seq Data

Date
2015-06-10
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Clustering RNA-seq data is used to characterize environment-induced (e.g., treatment) differences in gene expression profiles by separating genes into clusters based on their expression patterns. Wang et al. [2013] recently adopted the bi-Poisson distribution, obtained via the trivariate reduction method, as a model for clustering bivariate RNA-seq data. We discuss the inadequacy of the bi-Poisson distribution in modelling the correlation between dependent Poisson counts, and its impact on clustering such data. We introduce an alternative Gaussian copula model that incorporates a flexible dependence structure for the counts, report simulation results to compare the performance of the Gaussian copula and bi-Poisson models, and investigate the impact on clustering of Poisson counts of misspecified dependence structures. We illustrate our methodology on a lung cancer RNA-seq data.
Description
Keywords
Statistics
Citation
Ruan, J. (2015). Cluster Aanlysis of Gene Expression Profiles via Flexible Count Models for RNA-seq Data (Master's thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca. doi:10.11575/PRISM/25338