Variable Selection Using the Method of the Broken Adaptive Ridge Regression

Date
2024-07-08
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
In this thesis, we consider variable selection methods incorporating the Broken Adaptive Ridge Regression under a few different model frameworks that deal with joint modelling of recurrent and terminal events, high-dimensional covariates, low-dimensional categorical covariates, and low-dimensional continuous covariates in generalized partly linear models and partly linear Cox proportional hazards models. With data being more easily available than ever in the digital era, it is important that only relevant variables are retained when building a statistical model. In Chapter 2, we implement a novel method to simultaneously perform variable selection and estimation in the joint frailty model of recurrent and terminal events using the Broken Adaptive Ridge (BAR) penalty. The BAR penalty can be summarized as an iteratively reweighted squared $L_2$-penalized regression, which approximates the $L_0$-regularization. Our method allows for the number of covariates to diverge with the sample size. Under certain regularity conditions, we prove that the BAR estimator is consistent and asymptotically normally distributed, which are known as the oracle properties in the variable selection literature. In our simulation studies, we compare our proposed method to the Minimum Information Criterion (MIC) method. We apply our method to the Medical Information Mart for Intensive Care (MIMIC-III) database, with the aim of investigating which variables affect the risks of repeated ICU admissions and death during ICU stay. In Chapter 3, motivated by the CATHGEN data, we develop a new method for simultaneous variable selection and parameter estimation under the context of generalized partly linear models for data with high-dimensional covariates. The method is referred to as the BAR estimator, which is an approximation of the $L_0$-penalized regression by iteratively performing reweighted squared $L_2$-penalized regression. The generalized partly linear model extends the generalized linear model by including a non-parametric component to construct a flexible model for modeling various types of covariates, including linear and non-linear effects in different dimensions. We employ the Bernstein polynomials as the sieve space to approximate the non-parametric functions so that our method can be implemented easily using the existing R packages. Extensive simulation studies suggest that the proposed method performs better than other commonly used penalty-based variable selection methods. We apply the method to the CATHGEN data with a binary response from a coronary artery disease study, which motivated our research, and obtain new findings in both high-dimensional genetic and low-dimensional non-genetic covariates. In Chapter 4, we implement the BAR penalty under the partly linear Cox proportional hazards model with right-censored data, where our model framework considers three sets of covariates: high-dimensional covariates, low-dimensional categorical covariates, and low-dimensional continuous covariates. The low-dimensional continuous covariates are considered to have possible non-linear effects. Our variable selection method can be easily implemented by using existing R packages. From our simulation studies, we observe that our method performs better than other existing variable selection methods. Finally, we apply our method to the acute respiratory disease syndrome (ARDS) to discover relevant metabolites that contribute to the risk of dying in the ICU. Finally, we conclude the results from all three projects in Chapter 5.
Description
Keywords
Variable Selection, Survival Analysis, Nonlinear Approximation
Citation
Chan, C. Z. Y. (2024). Variable selection using the method of the broken adaptive ridge regression (Doctoral thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.