Visual Representation of Electronic Health Record’s Tabular Data for Predicting Sudden Cardiac Arrest
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Computer-aided diagnosis in healthcare involves collecting comprehensive health-related information about patients from Electronic Health Records (EHRs) and modelling the tabular data with appropriate data-driven approaches. By harnessing the power of EHRs, decision support systems can be developed to assist healthcare providers in making informed decisions, optimizing resource allocation, and ultimately reducing the cost of treatment. However, the laborious pre-processing steps on EHR data, such as imputing missing values, dimensionality reduction etc. and domain-specific feature engineering, hinder the model's transparency and interpretability. For example, making assumptions or using statistical methods to estimate the missing data points introduces uncertainty in interpreting the model's designing approach and to understanding how it arrived at its conclusions. In addition, conducting obscure feature engineering tasks without domain knowledge may lead to the loss of relevant features; limiting the model's ability to provide meaningful insights about the domain or make reliable predictions. To alleviate the above-prevailing challenges, this thesis proposes a method to represent the tabular EHR data in 2D images without leveraging any pre-processing or data cleaning tasks, which results in a generalized visual representation of EHR tabular data. Moreover, the proposed method is assessed by predicting cardiovascular disease, Sudden Cardiac Arrest (SCA) using deep convolutional neural networks (VGG-19, ResNet-50, DenseNet-121, and Inception-V3) to demonstrate its effectiveness compared to existing diagnosis techniques. The EHR data associated with the disease are collected from a publicly available, extensive MIMIC-III database, which consists of de-identified EHR data of over 40,000 patients who stayed in the ICU. The results evaluated on the dataset reveal that the highest performance was achieved by integrating an attention module with a pre-trained ResNet-50 model, which achieved a test accuracy of 82% with precision=0.86, recall=0.88, F1 score=0.87, and Area Under Curve (AUC)=0.81. The outcome demonstrates that the proposed method can perform highly in SCA prediction without missing value imputation and allow for more comprehensibility with less human expert intervention.