Develop a comprehensive hypertension prediction model and risk score in population-based data applying conventional statistical and machine learning approaches

Date
2021-04-01
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Hypertension is a common medical condition and is a significant risk factor for heart attack, stroke, kidney disease, and mortality. Developing a risk prediction model for hypertension incidence incorporating its risk factors can help identify high-risk individuals who should be targeted for healthy behavioral changes or medical treatment to prevent hypertension onset. This research aims to develop a robust hypertension prediction model for the general population. More specifically, we aimed to 1) conduct a comprehensive systematic review to identify risk factors and prediction models for hypertension incidence and perform a meta-analysis to evaluate the current model’s predictive performance. 2) develop a risk prediction model for incident hypertension in a Canadian cohort using a traditional modeling approach. 3) develop machine learning algorithms to predict hypertension incidence and compare their predictive performance with a traditional statistical model. We systematically searched MEDLINE, EMBASE, Web of Science, Scopus, and the grey literature for studies predicting the risk of hypertension among the general adult population. We identified 52 studies that presented 117 models, of which 75 were developed using traditional regression-based modeling and 42 using machine learning algorithms. No studies were from Canada where a hypertension prediction model was developed or validated. Meta-analysis showed the overall pooled C-statistics 0.75 [0.73 – 0.77] for the traditional regression-based models and 0.76 [0.72 – 0.79] for the machine learning-based models. The lack of a hypertension prediction model in a Canadian context motivated us to develop a new model. We used the data of 18,322 participants on 29 candidate variables from the large Alberta’s Tomorrow Project (ATP) to develop traditional Cox proportional hazards (PH) model. Age, sex, body mass index (BMI), systolic blood pressure (SBP), diabetes, total physical activity time, and cardiovascular disease were used as significant risk factors in the model. Our model showed good discrimination (Harrel’s C-statistic 0.77) and calibration (Grønnesby and Borgan test, χ^2 statistic = 8.75, p = 0.07; calibration slope 1.006). A risk score table to estimate hypertension risks at 2-, 3-, 5-, and 6-year were derived from the model to favor the model’s clinical implementation and workability. Five machine learning algorithms were also developed to predict hypertension incidence: penalized regression Ridge, Lasso, Elastic Net (EN), random survival forest (RSF), and gradient boosting (GB). The performance of machine learning algorithms was observed, similar to the traditional Cox PH model. Average C-indexes were 0.78, 0.78, 0.78, 0.76, 0.76, for Ridge, Lasso, Elastic Net, RSF, GB, respectively. Important features associated with each machine learning algorithms were also presented. We developed a simple yet practical prediction model to estimate the risk of incident hypertension for the Canadian population that relies on readily available variables. Our results showed little predictive performance difference between machine learning algorithms and the traditional Cox PH model in predicting hypertension incidence. Our newly developed model may help clinicians, and the general population assess their risks of new-onset hypertension and facilitate discussions on preventing this risk more effectively.
Description
Keywords
Prediction modeling
Citation
Chowdhury, M. Z. I. (2021). Develop a comprehensive hypertension prediction model and risk score in population-based data applying conventional statistical and machine learning approaches (Doctoral thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.