Peeking Through the Windows: Hyperparameters, Administrative Data, and Selective Windowing
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
With the growth of administrative data due to increased storage and digitization, there is a need for effective ways to process and analyze this information. This study examines the time-based nature of such data and suggests that choosing the right time window size is crucial and should be adjusted like other parameters in machine learning models. Additionally, an algorithm called the Time series Analysis to Investigate Binning (TAIB) algorithm is introduced. This algorithm determines which parts of the data might benefit from being examined in different time chunks, aiming to optimize the use of time-based data in machine learning. Two main datasets were utilized: one from The Calgary Drop-In and Rehab Centre (DI), covering 1991-2020, to categorize shelter users by their usage patterns; and the MIMIC III (Medical Information Mart for Intensive Care III) database, which provides details on ICU stays from 2001-2012, to predict the duration of a patient's stay. Both datasets were transformed into temporal matrices. Using the TAIB algorithm, the optimal features for time-based analysis were identified, leading to the creation of groups of feature sets for various time lengths. For testing, primary reliance was on basic implementations of several machine learning models. Preference was given to deep learning models due to their superior capability in managing vast data. Three experiments were conducted, modifying the model’s complexity, the features employed, and the type of model. The findings suggest that treating windowing as a hyperparameter improves the performance of machine learning models. Moreover, employing feature matrices offers an efficient alternative to using timesteps in environments constrained by system resources, simplifying the process of handling time-based data.