Creating a Frailty Case Definition for Primary Care EMR Using Machine Learning

Williamson, TylerLee, JoonAponte-Hao, Zhi Yun (Sylvia)2021-05-102021-05-102021-05-04Aponte-Hao, Z. Y. (2021). Creating a Frailty Case Definition for Primary Care EMR Using Machine Learning (Master's thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.http://hdl.handle.net/1880/113392Background: Frailty is a geriatric syndrome characterized by increased vulnerability and increased risk of adverse events. The Clinical Frailty Scale (CFS) is a judgement-based scale used to identify frailty in senior populations (over the age of 65). Primary care electronic medical records (EMRs) contain routinely collected medical data and can be used for frailty screening. There is currently no method to detect frailty automatically using primary care electronic medical records that aligns with the CFS definition. Purpose: To create a machine learning based algorithm for the identification of frailty in routinely collected primary care electronic medical records. Methods: Primary care physicians within the Canadian Primary Care Sentinel Surveillance Network retrospectively identified frailty in 5466 senior patients from their own practice using the CFS, and the corresponding patient EMR data were extracted and processed as features. The patient data were split 30-70, with 30% being the hold-out set used for final testing and 70% for the training set. A collection of machine learning algorithms was created using the training dataset, including regularized logistic regression models, support vector machines, random forests, k-nearest neighbours, classification and regression trees, feedforward neural networks, Naïve Bayes, and XGBoost. A balanced training dataset was also created by oversampling. Sensitivity analyses were also performed using two alternative dichotomization cut-offs of frailty. Final model performance was assessed using the hold-out dataset, and reported using ROC, accuracy, F1-score, sensitivity, specificity, positive and negative predictive values. Results: 18.4% of patients were classified as frail based on a CFS score of 5 and above. Of the 8 models developed, an XGBoost model had the best classification performance, with sensitivity of 78.14% and specificity of 74.41%. Neither the balanced training dataset, nor the sensitivity analyses using two alternative cut-offs resulted in improved performance. Conclusion: Supervised machine learning was able to distinguish between frail and non-frail patients with good performance. Future work may wish to develop a protocol for standardized assignment of the CFS, use all available unstructured and structured data, and supplement with additional geriatric-specific data.engUniversity of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.Supervised Machine LearningFrailtyElectronic Medical RecordsEpidemiologyCase DefinitionBiostatisticsEpidemiologyPublic HealthCreating a Frailty Case Definition for Primary Care EMR Using Machine Learningmaster thesis10.11575/PRISM/38846