Machine learning models for functional impairment risk prediction in ischemic stroke patients

Date
2020-09-03
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Background: Stroke-related functional impairment risk scores are commonly used to estimate the patient-specific risk of functional impairment in acute care settings. However, these models have been primarily developed based on regression models, which might not provide optimal predictive accuracy, especially when validated in an external cohort. Purpose: To evaluate the predictive accuracy of machine-learning (ML) models for predicting functional impairment risk in acute ischemic stroke patients. Second, to compare the predictive accuracy of machine-learning models and regression-based models using computer simulations. Methods: Using data from the Precise and Rapid Assessment of Collaterals with Multi-phase CT Angiography (PROVE-IT). The Modified Rankin Scale (mRS) score was used to assess the 90-day functional impairment status. The accuracy of machine-learning models such as random forest (RF), classification and regression tree (CART), support vector machine (SVM), C5.0 decision tree (DT), adaptive boost machine (ABM), and least absolute shrinkage and selection operator (LASSO) logistic regression, and logistic regression (LR) was used to predict the risk of patient-specific risk of 90-day functional impairment. Area under the receiver operating characteristic curve (AUC) sensitivity, specificity, Mathews correlation coefficient (MCC) and Brier score was used to assess the predictive accuracy of these models via internal cross-validation and external validation in the Identifying New Approaches to Optimize Thrombus Characterization for Predicting Early Recanalization and Reperfusion with IVtPA Using Serial CT Angiography (INTERSSeCT) cohort study. Monte Carlo methods were used to develop recommendations for selecting machine-learning models under a variety of data characteristics. Results: Both logistic regression and machine-learning models had comparable predictive accuracy when validated internally (AUC range = [0.65 – 0.72]; MCC range = [0.29 - 0.42]) and externally (AUC range = [0.66 – 0.71]; MCC range = [0.34 – 0.42]). However, regression-based had a fairly better calibration than the ML models. Our simulation study showed that ML and regression-based models are not equally robust to a variety of data analytic characteristics. LR models exhibited higher AUC in studies with a small/moderate set of predictors, while RF had about 15% higher discrimination studies with high dimensional set of predictors. ML models may be less accurate for predicting outcomes in studies with a few sets of predictors or when there is a large class imbalance in the data sets. Conclusions ML and regression-based algorithms are not equally sensitive to data analytic conditions, even though our data analysis revealed no significant differences between the former and the latter. ML might offer some discriminative advantages over the latter depending on the size and type of study predictors. We recommend that the choice between these classes of models should be guided by data characteristics, study design, and purpose for which the models are being developed.
Description
Keywords
Machine learning, Functional outcome, Risk prediction, Modified rankin scale
Citation
Alaka, S. A. (2020). Machine learning models for functional impairment risk prediction in ischemic stroke patients (Master's thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.