Peeking Through the Windows: Hyperparameters, Administrative Data, and Selective Windowing

Taib, Musa Jabbar

Peeking Through the Windows: Hyperparameters, Administrative Data, and Selective Windowing

Files

ucalgary_2023_taib_musa.pdf (2.79 MB)

Date

2023-10-17

Authors

Taib, Musa Jabbar

Abstract

With the growth of administrative data due to increased storage and digitization, there is a need for effective ways to process and analyze this information. This study examines the time-based nature of such data and suggests that choosing the right time window size is crucial and should be adjusted like other parameters in machine learning models. Additionally, an algorithm called the Time series Analysis to Investigate Binning (TAIB) algorithm is introduced. This algorithm determines which parts of the data might benefit from being examined in different time chunks, aiming to optimize the use of time-based data in machine learning. Two main datasets were utilized: one from The Calgary Drop-In and Rehab Centre (DI), covering 1991-2020, to categorize shelter users by their usage patterns; and the MIMIC III (Medical Information Mart for Intensive Care III) database, which provides details on ICU stays from 2001-2012, to predict the duration of a patient's stay. Both datasets were transformed into temporal matrices. Using the TAIB algorithm, the optimal features for time-based analysis were identified, leading to the creation of groups of feature sets for various time lengths. For testing, primary reliance was on basic implementations of several machine learning models. Preference was given to deep learning models due to their superior capability in managing vast data. Three experiments were conducted, modifying the model’s complexity, the features employed, and the type of model. The findings suggest that treating windowing as a hyperparameter improves the performance of machine learning models. Moreover, employing feature matrices offers an efficient alternative to using timesteps in environments constrained by system resources, simplifying the process of handling time-based data.

Keywords

Administrative Health Data, Machine Learning, Time series analysis, Binning, Windowing

Citation

Taib, M. J. (2023). Peeking through the windows: hyperparameters, administrative data, and selective windowing (Master's thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.

URI

https://hdl.handle.net/1880/117427
https://doi.org/10.11575/PRISM/42270

Collections

Open Theses and Dissertations

Full item page