Bi-level Variable Selection and Dimension-reduction Methods in Complex Lifetime Data Analytics

dc.contributor.advisorLu, Xuewen
dc.contributor.advisorShen, Hua
dc.contributor.authorCai, Kaida
dc.contributor.committeememberLu, Xuewen
dc.contributor.committeememberShen, Hua
dc.contributor.committeememberTekougang, Thierry Chekouo
dc.contributor.committeememberDeardon, Rob
dc.contributor.committeememberLong, Quan
dc.contributor.committeememberJin, Zhezhen
dc.date2020-02
dc.date.accessioned2020-01-03T22:24:42Z
dc.date.available2020-01-03T22:24:42Z
dc.date.issued2019-12
dc.description.abstractFor the high-dimensional data, the number of covariates can be large and diverge with the sample size. In many scientific applications, such as biological studies, the predictors or covariates are naturally grouped. In this thesis, we consider bi-level variable selection and dimension-reduction methods in complex lifetime data analytics under various survival models, and study their theoretical properties and finite sample performance under different scenarios. Specifically, in Chapter 2, we focus on the Andersen-Gill regression model for the analysis of recurrent event data with group covariates when the number of covariates is fixed. In order to study the effects of the covariates on the occurrence of recurrent events, a bi-level penalized group selection method is introduced to address the group selection problem. A general group-bridge penalty function with varying weights is invoked to achieve the goal. It is shown that the performance of the bi-level selection depends on the weights. In order to select covariates more efficiently, especially for identifying the important covariates in important groups, adaptive weights are required. The asymptotic oracle properties of the proposed method are investigated in the case of fixed number of covariates. Three methods of tuning parameter selection are proposed. Our simulation studies show that the proposed method performs well in selecting important groups and important individual covariates in these groups simultaneously, and outperforms other popular group selection methods and the traditional unpenalized Wald testing method. In Chapter 3, we extend the proposed method of recurrent event model to the case of a diverging number of covariates. We demonstrate that the proposed method has selection consistency and the penalized estimators have asymptotic normality in the case of diverging a number of covariates. Simulation studies show that the proposed method performs well and the results are consistent with the theoretical properties. We illustrate the method using a real life data set from medicine. In Chapter 4, by imitating the group variable selection procedure with bi-level penalty, we propose a new variable selection method for the analysis of multivariate failure time data, with an adaptive bi-level variable selection penalty function. In the regression setting, we treat the coefficients corresponding to the same prediction variable as a natural group, then consider variable selection at the group level and individual level simultaneously. The proposed adaptive bi-level variable selection method can select a prediction variable in two different levels: the first level is the group level, where the predictor is important to all failure types; the second level is the individual level, where the predictor is only important to some failure types. An algorithm based on cycle coordinate descent (CCD) is proposed to carry out the proposed method. Based on the simulation results, our method outperforms the classical penalty methods, especially in terms of removing unimportant variables for all different failure types. We obtain the asymptotic oracle properties of the proposed variable selection method in the case of diverging number of covariates. We construct a generalized cross validation (GCV) method for the tuning parameter selection and assess model performance based on model errors. We also illustrate the proposed method using a real life data set. Sufficient dimension reduction (SDR) is a powerful tool for dimension reduction in regression and classification problems, which replaces the original covariates with the minimal set of their linear combinations. In Chapter 5, we propose a novel penalty function, called adaptive group composite Lasso (AGCL), for the group sparse sufficient dimension reduction problem. By incorporating this new penalty with the sufficient dimension reduction method, we propose an adaptive group composite Lasso penalized dimension reduction method to simultaneously achieve sufficient dimension reduction and group variable selection in the case of diverging number of covariates. We investigate the asymptotic properties of the penalized sufficient dimension reduction estimators when the number of covariates diverges with the number of sample size. We show that the proposed method can select important groups and individual variables simultaneously. We compare the proposed method with other sparse sufficient dimension reduction methods using simulation studies. The results show that the proposed method outperforms the other methods in terms of removing unimportant covariates, especially in removing the unimportant groups. A real data example is used for illustration.en_US
dc.identifier.citationCai, K. (2019). Bi-level variable selection and dimension-reduction methods in complex lifetime data analytics (Doctoral thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.en_US
dc.identifier.doihttp://dx.doi.org/10.11575/PRISM/37404
dc.identifier.urihttp://hdl.handle.net/1880/111427
dc.language.isoengen_US
dc.publisher.facultyScienceen_US
dc.publisher.institutionUniversity of Calgaryen
dc.rightsUniversity of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.en_US
dc.subjectVariable selectionen_US
dc.subjectGroup variable selectionen_US
dc.subjectBi-level penaltyen_US
dc.subjectDimension reductionen_US
dc.subjectRecurrent eventsen_US
dc.subjectMultivariate failure time dataen_US
dc.subjectDiverging number of covariatesen_US
dc.subjectOracle propertyen_US
dc.subject.classificationStatisticsen_US
dc.titleBi-level Variable Selection and Dimension-reduction Methods in Complex Lifetime Data Analyticsen_US
dc.typedoctoral thesisen_US
thesis.degree.disciplineMathematics & Statisticsen_US
thesis.degree.grantorUniversity of Calgaryen_US
thesis.degree.nameDoctor of Philosophy (PhD)en_US
ucalgary.item.requestcopytrueen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ucalgary_2019_cai_kaida.pdf
Size:
1.11 MB
Format:
Adobe Portable Document Format
Description:
Thesis
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.62 KB
Format:
Item-specific license agreed upon to submission
Description: