Williamson, Tyler S.Sajobi, Tolulope T.Lethebe, Brendan Cord2018-04-252018-04-252018-04-23Lethebe, B. C. (2018). Using machine learning methods to improve chronic disease case definitions in primary care electronic medical records (Master's thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca. doi:10.11575/PRISM/31824http://hdl.handle.net/1880/106538Background: Chronic disease surveillance at the primary care level is becoming more feasible with the increased use of electronic medical records (EMRs). However, the quality of surveillance information is directly dependent on the quality of the case definitions that identify the conditions of interest. Purpose: To determine whether machine learning algorithms can produce chronic disease case definitions comparable to committee created case definitions in a primary care EMR setting. Methods: A chart review was conducted for the presence of hypertension, diabetes, osteoarthritis, and depression in a cohort of 1920 patients from the Canadian Primary Care Sentinel Surveillance Network database. The results of this chart review were used as training data. The C5.0, Classification and Regression Tree, Chi-Squared Automated Interaction Detection decision trees, Forward Stepwise logistic regression, Least Absolute Shrinkage and Selection Operator penalized logistic regression were compared using 10-fold cross validation. Sensitivity, specificity, positive predictive value and negative predictive value were estimated and compared for the four chronic conditions of interest. Results: Validity measures were similar across algorithms. For hypertension, sensitivity ranged between 93.1-96.7%, while specificity ranged from 88.8-93.2%. For diabetes, sensitivities ranged from 93.5-96.3% with specificities between 97.1-99.0%. For osteoarthritis, sensitivities ranged from 82.0-84.4% with specificities between 92.7-94.0%. For depression, sensitivities went from 81.4-88.3%, and specificities ranged from 93.4-94.9%. Compared with the committee-created case definitions, these metrics were equivalent or better using the machine learning method. Conclusions: Machine learning algorithms produced accurate case definitions comparable to committee-created case definitions. It is possible to use machine learning techniques to develop high quality case definitions from EMR data.engUniversity of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.Machine LearningPredictionStatisticsCase DefinitionSurveillanceChronic DiseaseBiostatisticsPublic HealthStatisticsComputer ScienceUsing machine learning methods to improve chronic disease case definitions in primary care electronic medical recordsmaster thesishttp://dx.doi.org/10.11575/PRISM/31824