Bi-level Variable Selection and Dimension-reduction Methods in Complex Lifetime Data Analytics

Cai, Kaida

Bi-level Variable Selection and Dimension-reduction Methods in Complex Lifetime Data Analytics

dc.contributor.advisor	Lu, Xuewen
dc.contributor.advisor	Shen, Hua
dc.contributor.author	Cai, Kaida
dc.contributor.committeemember	Lu, Xuewen
dc.contributor.committeemember	Shen, Hua
dc.contributor.committeemember	Tekougang, Thierry Chekouo
dc.contributor.committeemember	Deardon, Rob
dc.contributor.committeemember	Long, Quan
dc.contributor.committeemember	Jin, Zhezhen
dc.date	2020-02
dc.date.accessioned	2020-01-03T22:24:42Z
dc.date.available	2020-01-03T22:24:42Z
dc.date.issued	2019-12
dc.description.abstract	For the high-dimensional data, the number of covariates can be large and diverge with the sample size. In many scientific applications, such as biological studies, the predictors or covariates are naturally grouped. In this thesis, we consider bi-level variable selection and dimension-reduction methods in complex lifetime data analytics under various survival models, and study their theoretical properties and finite sample performance under different scenarios. Specifically, in Chapter 2, we focus on the Andersen-Gill regression model for the analysis of recurrent event data with group covariates when the number of covariates is fixed. In order to study the effects of the covariates on the occurrence of recurrent events, a bi-level penalized group selection method is introduced to address the group selection problem. A general group-bridge penalty function with varying weights is invoked to achieve the goal. It is shown that the performance of the bi-level selection depends on the weights. In order to select covariates more efficiently, especially for identifying the important covariates in important groups, adaptive weights are required. The asymptotic oracle properties of the proposed method are investigated in the case of fixed number of covariates. Three methods of tuning parameter selection are proposed. Our simulation studies show that the proposed method performs well in selecting important groups and important individual covariates in these groups simultaneously, and outperforms other popular group selection methods and the traditional unpenalized Wald testing method. In Chapter 3, we extend the proposed method of recurrent event model to the case of a diverging number of covariates. We demonstrate that the proposed method has selection consistency and the penalized estimators have asymptotic normality in the case of diverging a number of covariates. Simulation studies show that the proposed method performs well and the results are consistent with the theoretical properties. We illustrate the method using a real life data set from medicine. In Chapter 4, by imitating the group variable selection procedure with bi-level penalty, we propose a new variable selection method for the analysis of multivariate failure time data, with an adaptive bi-level variable selection penalty function. In the regression setting, we treat the coefficients corresponding to the same prediction variable as a natural group, then consider variable selection at the group level and individual level simultaneously. The proposed adaptive bi-level variable selection method can select a prediction variable in two different levels: the first level is the group level, where the predictor is important to all failure types; the second level is the individual level, where the predictor is only important to some failure types. An algorithm based on cycle coordinate descent (CCD) is proposed to carry out the proposed method. Based on the simulation results, our method outperforms the classical penalty methods, especially in terms of removing unimportant variables for all different failure types. We obtain the asymptotic oracle properties of the proposed variable selection method in the case of diverging number of covariates. We construct a generalized cross validation (GCV) method for the tuning parameter selection and assess model performance based on model errors. We also illustrate the proposed method using a real life data set. Sufficient dimension reduction (SDR) is a powerful tool for dimension reduction in regression and classification problems, which replaces the original covariates with the minimal set of their linear combinations. In Chapter 5, we propose a novel penalty function, called adaptive group composite Lasso (AGCL), for the group sparse sufficient dimension reduction problem. By incorporating this new penalty with the sufficient dimension reduction method, we propose an adaptive group composite Lasso penalized dimension reduction method to simultaneously achieve sufficient dimension reduction and group variable selection in the case of diverging number of covariates. We investigate the asymptotic properties of the penalized sufficient dimension reduction estimators when the number of covariates diverges with the number of sample size. We show that the proposed method can select important groups and individual variables simultaneously. We compare the proposed method with other sparse sufficient dimension reduction methods using simulation studies. The results show that the proposed method outperforms the other methods in terms of removing unimportant covariates, especially in removing the unimportant groups. A real data example is used for illustration.	en_US
dc.identifier.citation	Cai, K. (2019). Bi-level variable selection and dimension-reduction methods in complex lifetime data analytics (Doctoral thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.	en_US
dc.identifier.doi	http://dx.doi.org/10.11575/PRISM/37404
dc.identifier.uri	http://hdl.handle.net/1880/111427
dc.language.iso	eng	en_US
dc.publisher.faculty	Science	en_US
dc.publisher.institution	University of Calgary	en
dc.rights	University of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.	en_US
dc.subject	Variable selection	en_US
dc.subject	Group variable selection	en_US
dc.subject	Bi-level penalty	en_US
dc.subject	Dimension reduction	en_US
dc.subject	Recurrent events	en_US
dc.subject	Multivariate failure time data	en_US
dc.subject	Diverging number of covariates	en_US
dc.subject	Oracle property	en_US
dc.subject.classification	Statistics	en_US
dc.title	Bi-level Variable Selection and Dimension-reduction Methods in Complex Lifetime Data Analytics	en_US
dc.type	doctoral thesis	en_US
thesis.degree.discipline	Mathematics & Statistics	en_US
thesis.degree.grantor	University of Calgary	en_US
thesis.degree.name	Doctor of Philosophy (PhD)	en_US
ucalgary.item.requestcopy	true	en_US