Browsing by Author "Walker, Robin L."
Now showing 1 - 7 of 7
Results Per Page
Sort Options
- ItemOpen AccessAuthors’ opinions on publication in relation to annual performance assessment(BioMed Central, 2010-03-09) Walker, Robin L.; Sykes, Lindsay; Hemmelgarn, Brenda; Quan, Hude
- ItemOpen AccessCerebrovascular disease case identification in inpatient electronic medical record data using natural language processing(2023-09-02) Pan, Jie; Zhang, Zilong; Peters, Steven R.; Vatanpour, Shabnam; Walker, Robin L.; Lee, Seungwon; Martin, Elliot A.; Quan, HudeAbstract Background Abstracting cerebrovascular disease (CeVD) from inpatient electronic medical records (EMRs) through natural language processing (NLP) is pivotal for automated disease surveillance and improving patient outcomes. Existing methods rely on coders’ abstraction, which has time delays and under-coding issues. This study sought to develop an NLP-based method to detect CeVD using EMR clinical notes. Methods CeVD status was confirmed through a chart review on randomly selected hospitalized patients who were 18 years or older and discharged from 3 hospitals in Calgary, Alberta, Canada, between January 1 and June 30, 2015. These patients’ chart data were linked to administrative discharge abstract database (DAD) and Sunrise™ Clinical Manager (SCM) EMR database records by Personal Health Number (a unique lifetime identifier) and admission date. We trained multiple natural language processing (NLP) predictive models by combining two clinical concept extraction methods and two supervised machine learning (ML) methods: random forest and XGBoost. Using chart review as the reference standard, we compared the model performances with those of the commonly applied International Classification of Diseases (ICD-10-CA) codes, on the metrics of sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). Result Of the study sample (n = 3036), the prevalence of CeVD was 11.8% (n = 360); the median patient age was 63; and females accounted for 50.3% (n = 1528) based on chart data. Among 49 extracted clinical documents from the EMR, four document types were identified as the most influential text sources for identifying CeVD disease (“nursing transfer report,” “discharge summary,” “nursing notes,” and “inpatient consultation.”). The best performing NLP model was XGBoost, combining the Unified Medical Language System concepts extracted by cTAKES (e.g., top-ranked concepts, “Cerebrovascular accident” and “Transient ischemic attack”), and the term frequency-inverse document frequency vectorizer. Compared with ICD codes, the model achieved higher validity overall, such as sensitivity (25.0% vs 70.0%), specificity (99.3% vs 99.1%), PPV (82.6 vs. 87.8%), and NPV (90.8% vs 97.1%). Conclusion The NLP algorithm developed in this study performed better than the ICD code algorithm in detecting CeVD. The NLP models could result in an automated EMR tool for identifying CeVD cases and be applied for future studies such as surveillance, and longitudinal studies.
- ItemOpen AccessDevelopment of International Indicators for Assessing the Quality of ICD-coded Administrative Health Data(2020-12-22) Otero Varela, Lucia; Quan, Hude; Eastwood, Cathy A.; Walker, Robin L.; Leal, Jenine R.Introduction: Health data are generated at each patient encounter with the healthcare system worldwide, then collected and stored as administrative health data. As an example, inpatient data are coded in the hospital morbidity database using the International Classification of Diseases (ICD), which is a reference standard for reporting diseases and health conditions globally. The quality of ICD-coded data is affected by multiple factors, such as worldwide variations in ICD use and its meta-features across countries, which can hinder meaningful comparisons of morbidity data. Assessing data quality is therefore essential for the ultimate goal of improving it. Given the current lack of an international approach for, we aimed to develop a standardized method for assessing hospital morbidity data quality. Methods: First, we conducted an international online questionnaire to better understand the differences in coding practices and hospital data collection systems across countries. Second, through the combination of a comprehensive environmental scan and a Delphi consensus process, we developed a set of global data quality indicators (DQIs) for the hospital morbidity database. Results: The international questionnaire revealed variances in all aspects of ICD data collection features, including: the maximum number of coding fields allowed for diagnosis and interventions, the definition of main condition, as well as the data fields that are mandatory to capture in the hospital morbidity database. The Delphi exercise resulted in 24 DQIs, encompassing five dimensions of data quality (e.g., Relevance, Accuracy and reliability, Comparability and coherence, Timeliness, and Accessibility and clarity), and can be used to assess data quality using the same standard across countries and to highlight areas in need of improvement. Conclusion: Emphasis should be placed on standardizing ICD data collection systems and enhancing the quality of ICD-coded data. These findings could facilitate international comparisons of health data and data quality, and could serve as a guidance for policy- and decision-makers worldwide.
- ItemOpen AccessEvaluating the coding accuracy of type 2 diabetes mellitus among patients with non-alcoholic fatty liver disease(2024-02-16) Lee, Seungwon; Shaheen, Abdel A.; Campbell, David J. T.; Naugler, Christopher; Jiang, Jason; Walker, Robin L.; Quan, Hude; Lee, JoonAbstract Background Non-alcoholic fatty liver disease (NAFLD) describes a spectrum of chronic fattening of liver that can lead to fibrosis and cirrhosis. Diabetes has been identified as a major comorbidity that contributes to NAFLD progression. Health systems around the world make use of administrative data to conduct population-based prevalence studies. To that end, we sought to assess the accuracy of diabetes International Classification of Diseases (ICD) coding in administrative databases among a cohort of confirmed NAFLD patients in Calgary, Alberta, Canada. Methods The Calgary NAFLD Pathway Database was linked to the following databases: Physician Claims, Discharge Abstract Database, National Ambulatory Care Reporting System, Pharmaceutical Information Network database, Laboratory, and Electronic Medical Records. Hemoglobin A1c and diabetes medication details were used to classify diabetes groups into absent, prediabetes, meeting glycemic targets, and not meeting glycemic targets. The performance of ICD codes among these groups was compared to this standard. Within each group, the total numbers of true positives, false positives, false negatives, and true negatives were calculated. Descriptive statistics and bivariate analysis were conducted on identified covariates, including demographics and types of interacted physicians. Results A total of 12,012 NAFLD patients were registered through the Calgary NAFLD Pathway Database and 100% were successfully linked to the administrative databases. Overall, diabetes coding showed a sensitivity of 0.81 and a positive predictive value of 0.87. False negative rates in the absent and not meeting glycemic control groups were 4.5% and 6.4%, respectively, whereas the meeting glycemic control group had a 42.2% coding error. Visits to primary and outpatient services were associated with most encounters. Conclusion Diabetes ICD coding in administrative databases can accurately detect true diabetic cases. However, patients with diabetes who meets glycemic control targets are less likely to be coded in administrative databases. A detailed understanding of the clinical context will require additional data linkage from primary care settings.
- ItemOpen AccessPersonalized prediction of incident hospitalization for cardiovascular disease in patients with hypertension using machine learning(2022-12-17) Feng, Yuanchao; Leung, Alexander A.; Lu, Xuewen; Liang, Zhiying; Quan, Hude; Walker, Robin L.Abstract Background Prognostic information for patients with hypertension is largely based on population averages. The purpose of this study was to compare the performance of four machine learning approaches for personalized prediction of incident hospitalization for cardiovascular disease among newly diagnosed hypertensive patients. Methods Using province-wide linked administrative health data in Alberta, we analyzed a cohort of 259,873 newly-diagnosed hypertensive patients from 2009 to 2015 who collectively had 11,863 incident hospitalizations for heart failure, myocardial infarction, and stroke. Linear multi-task logistic regression, neural multi-task logistic regression, random survival forest and Cox proportional hazard models were used to determine the number of event-free survivors at each time-point and to construct individual event-free survival probability curves. The predictive performance was evaluated by root mean squared error, mean absolute error, concordance index, and the Brier score. Results The random survival forest model has the lowest root mean squared error value at 33.94 and lowest mean absolute error value at 28.37. Machine learning methods provide similar discrimination and calibration in the personalized survival prediction of hospitalizations for cardiovascular events in patients with hypertension. Neural multi-task logistic regression model has the highest concordance index at 0.8149 and lowest Brier score at 0.0242 for the personalized survival prediction. Conclusions This is the first personalized survival prediction for cardiovascular diseases among hypertensive patients using administrative data. The four models tested in this analysis exhibited a similar discrimination and calibration ability in predicting personalized survival prediction of hypertension patients.
- ItemOpen AccessPersonalized survival prediction of cardiovascular disease among hypertensive patients: a machine learning approach based on health administrative data(2020-12-22) Feng, Yuanchao; Quan, Hude; Walker, Robin L.; Leung, Alexander; Lu, XuewenBackground: Cardiovascular disease (CVD) kills approximately 17 million people globally every year, and they mainly exhibit myocardial infarctions, heart failure and stroke. Hypertension is the leading risk factor for premature death from CVD. Available routinely collected administrative health data with demographic features, comorbidities information, clinical laboratory test values and medication usage results can be used to perform biostatistics analysis aimed at highlights and correlations otherwise undetectable by medical doctors. Machine learning algorithms can predict patients’ survival by using information recorded in their medical records. Objective: To compare the performance of four machine learning approaches on personalized survival prediction of CVD outcomes among newly diagnosed hypertensive patients. Method: Hypertension cohort, CVD outcomes, and covariates were defined using validated case definitions applied to inpatient and outpatient administrative health databases. We analyzed a cohort of 11863 CVD events among 259,873 newly diagnosed hypertensive patients from April 1, 2009 to March 31, 2015 and had at least one-year follow-up. We applied linear multi-task logistic regression (LMTLR), neural multi-task logistic regression (NMTLR), random survival forest (RSF) and Cox proportional hazard (CoxPH) models to both predict the number of CVD outcomes in each survival time point and predict individual survival probability curve. The predictive performance was evaluated by root mean squared error (RMSE), mean absolute error (MAE), concordance index (C-index) and Brier score. Results: Our results show that the RSF model has the lowest RMSE value at 33.94 and lowest MAE value at 28.37, which means it has the better performance to predict the number of CVD events at any time point during the follow-up period. NMTLR model has the highest C-index at 0.8149 and lowest Brier score at 0.0242 for the individual survival prediction. Conclusions: This is the first personalized survival prediction for CVD among hypertensive patients using administrative data. The four models tested in this analysis (LMTLR, NMTLR, RSF, CoxPH) exhibited similar discrimination and calibration ability in predicting the survival of hypertension patients. In the test dataset, RSF has better performance for population-based survival prediction while the NMTLR had better discrimination and calibration for individual-based survival prediction.
- ItemOpen AccessPost-colorectal cancer screening: knowledge and understanding(2007) Walker, Robin L.; Hilsden, Robert; McGregor, S. Elizabeth