Personalized survival prediction of cardiovascular disease among hypertensive patients: a machine learning approach based on health administrative data

dc.contributor.advisorQuan, Hude
dc.contributor.advisorWalker, Robin L.
dc.contributor.authorFeng, Yuanchao
dc.contributor.committeememberLeung, Alexander
dc.contributor.committeememberLu, Xuewen
dc.date2021-02
dc.date.accessioned2021-01-07T21:42:57Z
dc.date.available2021-01-07T21:42:57Z
dc.date.issued2020-12-22
dc.description.abstractBackground: Cardiovascular disease (CVD) kills approximately 17 million people globally every year, and they mainly exhibit myocardial infarctions, heart failure and stroke. Hypertension is the leading risk factor for premature death from CVD. Available routinely collected administrative health data with demographic features, comorbidities information, clinical laboratory test values and medication usage results can be used to perform biostatistics analysis aimed at highlights and correlations otherwise undetectable by medical doctors. Machine learning algorithms can predict patients’ survival by using information recorded in their medical records. Objective: To compare the performance of four machine learning approaches on personalized survival prediction of CVD outcomes among newly diagnosed hypertensive patients. Method: Hypertension cohort, CVD outcomes, and covariates were defined using validated case definitions applied to inpatient and outpatient administrative health databases. We analyzed a cohort of 11863 CVD events among 259,873 newly diagnosed hypertensive patients from April 1, 2009 to March 31, 2015 and had at least one-year follow-up. We applied linear multi-task logistic regression (LMTLR), neural multi-task logistic regression (NMTLR), random survival forest (RSF) and Cox proportional hazard (CoxPH) models to both predict the number of CVD outcomes in each survival time point and predict individual survival probability curve. The predictive performance was evaluated by root mean squared error (RMSE), mean absolute error (MAE), concordance index (C-index) and Brier score. Results: Our results show that the RSF model has the lowest RMSE value at 33.94 and lowest MAE value at 28.37, which means it has the better performance to predict the number of CVD events at any time point during the follow-up period. NMTLR model has the highest C-index at 0.8149 and lowest Brier score at 0.0242 for the individual survival prediction. Conclusions: This is the first personalized survival prediction for CVD among hypertensive patients using administrative data. The four models tested in this analysis (LMTLR, NMTLR, RSF, CoxPH) exhibited similar discrimination and calibration ability in predicting the survival of hypertension patients. In the test dataset, RSF has better performance for population-based survival prediction while the NMTLR had better discrimination and calibration for individual-based survival prediction.en_US
dc.identifier.citationFeng, Y. (2020). Personalized survival prediction of cardiovascular disease among hypertensive patients: a machine learning approach based on health administrative data (Master's thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.en_US
dc.identifier.doihttp://dx.doi.org/10.11575/PRISM/38536
dc.identifier.urihttp://hdl.handle.net/1880/112944
dc.language.isoengen_US
dc.publisher.facultyCumming School of Medicineen_US
dc.publisher.institutionUniversity of Calgaryen
dc.rightsUniversity of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.en_US
dc.subjectAdministrative health dataen_US
dc.subjectMachine learningen_US
dc.subjectPersonalized survival predictionen_US
dc.subjectHypertensionen_US
dc.subjectCardiovascular diseasesen_US
dc.subject.classificationBiostatisticsen_US
dc.subject.classificationEpidemiologyen_US
dc.subject.classificationPublic Healthen_US
dc.titlePersonalized survival prediction of cardiovascular disease among hypertensive patients: a machine learning approach based on health administrative dataen_US
dc.typemaster thesisen_US
thesis.degree.disciplineMedicine – Community Health Sciencesen_US
thesis.degree.grantorUniversity of Calgaryen_US
thesis.degree.nameMaster of Science (MSc)en_US
ucalgary.item.requestcopytrueen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ucalgary_2020_feng_yuanchao.pdf
Size:
1.01 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.62 KB
Format:
Item-specific license agreed upon to submission
Description: