Personalized survival prediction of cardiovascular disease among hypertensive patients: a machine learning approach based on health administrative data
Date
2020-12-22
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Background: Cardiovascular disease (CVD) kills approximately 17 million people globally every year, and they mainly exhibit myocardial infarctions, heart failure and stroke. Hypertension is the leading risk factor for premature death from CVD. Available routinely collected administrative health data with demographic features, comorbidities information, clinical laboratory test values and medication usage results can be used to perform biostatistics analysis aimed at highlights and correlations otherwise undetectable by medical doctors. Machine learning algorithms can predict patients’ survival by using information recorded in their medical records. Objective: To compare the performance of four machine learning approaches on personalized survival prediction of CVD outcomes among newly diagnosed hypertensive patients. Method: Hypertension cohort, CVD outcomes, and covariates were defined using validated case definitions applied to inpatient and outpatient administrative health databases. We analyzed a cohort of 11863 CVD events among 259,873 newly diagnosed hypertensive patients from April 1, 2009 to March 31, 2015 and had at least one-year follow-up. We applied linear multi-task logistic regression (LMTLR), neural multi-task logistic regression (NMTLR), random survival forest (RSF) and Cox proportional hazard (CoxPH) models to both predict the number of CVD outcomes in each survival time point and predict individual survival probability curve. The predictive performance was evaluated by root mean squared error (RMSE), mean absolute error (MAE), concordance index (C-index) and Brier score. Results: Our results show that the RSF model has the lowest RMSE value at 33.94 and lowest MAE value at 28.37, which means it has the better performance to predict the number of CVD events at any time point during the follow-up period. NMTLR model has the highest C-index at 0.8149 and lowest Brier score at 0.0242 for the individual survival prediction. Conclusions: This is the first personalized survival prediction for CVD among hypertensive patients using administrative data. The four models tested in this analysis (LMTLR, NMTLR, RSF, CoxPH) exhibited similar discrimination and calibration ability in predicting the survival of hypertension patients. In the test dataset, RSF has better performance for population-based survival prediction while the NMTLR had better discrimination and calibration for individual-based survival prediction.
Description
Keywords
Administrative health data, Machine learning, Personalized survival prediction, Hypertension, Cardiovascular diseases
Citation
Feng, Y. (2020). Personalized survival prediction of cardiovascular disease among hypertensive patients: a machine learning approach based on health administrative data (Master's thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.