Cluster Aanlysis of Gene Expression Profiles via Flexible Count Models for RNA-seq Data

Ruan, Ji

Cluster Aanlysis of Gene Expression Profiles via Flexible Count Models for RNA-seq Data

Files

ucalgary_2015_ruan_ji.pdf(1.04 MB)

Date

2015-06-10

Authors

Ruan, Ji

Abstract

Clustering RNA-seq data is used to characterize environment-induced (e.g., treatment) differences in gene expression profiles by separating genes into clusters based on their expression patterns. Wang et al. [2013] recently adopted the bi-Poisson distribution, obtained via the trivariate reduction method, as a model for clustering bivariate RNA-seq data. We discuss the inadequacy of the bi-Poisson distribution in modelling the correlation between dependent Poisson counts, and its impact on clustering such data. We introduce an alternative Gaussian copula model that incorporates a flexible dependence structure for the counts, report simulation results to compare the performance of the Gaussian copula and bi-Poisson models, and investigate the impact on clustering of Poisson counts of misspecified dependence structures. We illustrate our methodology on a lung cancer RNA-seq data.

Keywords

Statistics

Citation

Ruan, J. (2015). Cluster Aanlysis of Gene Expression Profiles via Flexible Count Models for RNA-seq Data (Master's thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca. doi:10.11575/PRISM/25338

URI

http://hdl.handle.net/11023/2293

Collections

Open Theses and Dissertations

Full item page