Analysis of Metabolomics Data via Mixed Models

dc.contributor.advisorde Leon, Alexander R.
dc.contributor.advisorKopciuk, Karen Arlene
dc.contributor.authorRen, Austin Mu Qing
dc.contributor.committeememberVogel, Hans J.
dc.contributor.committeememberSajobi, Tolulope T.
dc.date2020-11
dc.date.accessioned2020-08-17T22:03:01Z
dc.date.available2020-08-17T22:03:01Z
dc.date.issued2020-08
dc.description.abstractGeneralized linear mixed models have been widely studied and used in many different disciplines, yet very little application of them can be found with metabolomics data analysis. Traditional methods of cancer classification used to determine disease severity, such as biopsies, can be harmful to the health of the patients. Classification based on metabolomics data analysis demonstrates a main advantage as it only requires non-invasive procedures such as the drawing of a small amount of blood from patients. However, data analysis in cancer research often requires the handling of multiple correlated measurements of disease severity. The methods that are most commonly used with metabolomics data, such as partial least squares discriminant analysis, were traditionally designed to handle univariate data only, and can be very challenging to work with when applied to data with multiple correlated outcomes. Therefore, different methods should be considered for metabolomics data analysis in cancer classification. In this thesis, we proposed bivariate generalized linear mixed models with binary outcomes using the probit link function for the analysis of metabolomics data. The models were specifically designed to handle multiple correlated outcomes via the inclusion of subject-specific random intercepts. Random slopes were not included in the models to reduce complexity. We specifically designed three settings for the random intercept models: shared, independent, and correlated between the outcomes. An extensive number of simulations were carried out to test our models' parameters, including: standard deviation and correlation of the distribution of the random intercepts, correlation between the covariates as well as correlation between the covariates and the outcomes, the proportion of data missing among the covariates, misspecified distribution of the random intercepts, and misspecified conditional correlation between the outcomes. In addition, we also incorporated the nearest neighbors algorithm as a missing values imputation method and LASSO as a feature selection method to our mixed models in order to handle the common issues of high dimensional covariates and missing values in metabolomics data. Finally, our proposed mixed models were applied to a real dataset with prostate cancer patients to evaluate our models' performance on outcome predictions.en_US
dc.identifier.citationRen, A. M. (2020). Analysis of metabolomics data via mixed models (Master's thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.en_US
dc.identifier.doihttp://dx.doi.org/10.11575/PRISM/38084
dc.identifier.urihttp://hdl.handle.net/1880/112397
dc.language.isoengen_US
dc.publisher.facultyScienceen_US
dc.publisher.institutionUniversity of Calgaryen
dc.rightsUniversity of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.en_US
dc.subjectMixed Modelsen_US
dc.subjectMetabolomicsen_US
dc.subjectNMRen_US
dc.subjectProstate Canceren_US
dc.subjectLASSOen_US
dc.subjectPLSen_US
dc.subjectKNNen_US
dc.subjectCancer Diagnosticen_US
dc.subject.classificationEducation--Healthen_US
dc.subject.classificationEducation--Sciencesen_US
dc.subject.classificationOncologyen_US
dc.subject.classificationBiochemistryen_US
dc.subject.classificationStatisticsen_US
dc.titleAnalysis of Metabolomics Data via Mixed Modelsen_US
dc.typemaster thesisen_US
thesis.degree.disciplineMathematics & Statisticsen_US
thesis.degree.grantorUniversity of Calgaryen_US
thesis.degree.nameMaster of Science (MSc)en_US
ucalgary.item.requestcopytrueen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ucalgary_2020_ren_muqing.pdf
Size:
702.14 KB
Format:
Adobe Portable Document Format
Description:
Main article
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.62 KB
Format:
Item-specific license agreed upon to submission
Description: