Analysis of Metabolomics Data via Mixed Models
dc.contributor.advisor | de Leon, Alexander R. | |
dc.contributor.advisor | Kopciuk, Karen Arlene | |
dc.contributor.author | Ren, Austin Mu Qing | |
dc.contributor.committeemember | Vogel, Hans J. | |
dc.contributor.committeemember | Sajobi, Tolulope T. | |
dc.date | 2020-11 | |
dc.date.accessioned | 2020-08-17T22:03:01Z | |
dc.date.available | 2020-08-17T22:03:01Z | |
dc.date.issued | 2020-08 | |
dc.description.abstract | Generalized linear mixed models have been widely studied and used in many different disciplines, yet very little application of them can be found with metabolomics data analysis. Traditional methods of cancer classification used to determine disease severity, such as biopsies, can be harmful to the health of the patients. Classification based on metabolomics data analysis demonstrates a main advantage as it only requires non-invasive procedures such as the drawing of a small amount of blood from patients. However, data analysis in cancer research often requires the handling of multiple correlated measurements of disease severity. The methods that are most commonly used with metabolomics data, such as partial least squares discriminant analysis, were traditionally designed to handle univariate data only, and can be very challenging to work with when applied to data with multiple correlated outcomes. Therefore, different methods should be considered for metabolomics data analysis in cancer classification. In this thesis, we proposed bivariate generalized linear mixed models with binary outcomes using the probit link function for the analysis of metabolomics data. The models were specifically designed to handle multiple correlated outcomes via the inclusion of subject-specific random intercepts. Random slopes were not included in the models to reduce complexity. We specifically designed three settings for the random intercept models: shared, independent, and correlated between the outcomes. An extensive number of simulations were carried out to test our models' parameters, including: standard deviation and correlation of the distribution of the random intercepts, correlation between the covariates as well as correlation between the covariates and the outcomes, the proportion of data missing among the covariates, misspecified distribution of the random intercepts, and misspecified conditional correlation between the outcomes. In addition, we also incorporated the nearest neighbors algorithm as a missing values imputation method and LASSO as a feature selection method to our mixed models in order to handle the common issues of high dimensional covariates and missing values in metabolomics data. Finally, our proposed mixed models were applied to a real dataset with prostate cancer patients to evaluate our models' performance on outcome predictions. | en_US |
dc.identifier.citation | Ren, A. M. (2020). Analysis of metabolomics data via mixed models (Master's thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca. | en_US |
dc.identifier.doi | http://dx.doi.org/10.11575/PRISM/38084 | |
dc.identifier.uri | http://hdl.handle.net/1880/112397 | |
dc.language.iso | eng | en_US |
dc.publisher.faculty | Science | en_US |
dc.publisher.institution | University of Calgary | en |
dc.rights | University of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission. | en_US |
dc.subject | Mixed Models | en_US |
dc.subject | Metabolomics | en_US |
dc.subject | NMR | en_US |
dc.subject | Prostate Cancer | en_US |
dc.subject | LASSO | en_US |
dc.subject | PLS | en_US |
dc.subject | KNN | en_US |
dc.subject | Cancer Diagnostic | en_US |
dc.subject.classification | Education--Health | en_US |
dc.subject.classification | Education--Sciences | en_US |
dc.subject.classification | Oncology | en_US |
dc.subject.classification | Biochemistry | en_US |
dc.subject.classification | Statistics | en_US |
dc.title | Analysis of Metabolomics Data via Mixed Models | en_US |
dc.type | master thesis | en_US |
thesis.degree.discipline | Mathematics & Statistics | en_US |
thesis.degree.grantor | University of Calgary | en_US |
thesis.degree.name | Master of Science (MSc) | en_US |
ucalgary.item.requestcopy | true | en_US |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- ucalgary_2020_ren_muqing.pdf
- Size:
- 702.14 KB
- Format:
- Adobe Portable Document Format
- Description:
- Main article
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 2.62 KB
- Format:
- Item-specific license agreed upon to submission
- Description: