QDA Classification for Two-Component Mixture with Data of Rare and Weak Signal

Date
2019-12-20
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This thesis deals with the two-class classification problem for data with rare and weak signals, under the modern setup of p >> n (large p small n). Considering the two-component mixture of Gaussian features with different random mean vector of rare and weak signals but common covariance matrix (homoscedastic Gaussian), Fan et al. (2013) discussed the optimality of linear discriminant analysis (LDA) and proposed an efficient variable selection and classification procedure. This thesis is an extension of their work in the sense that we assume the two components have different random covariance matrix (heterogenous Gaussian) of rare and weak signals. As a start of this research, for simplicity we assume the two population mean vectors are the same in order to assess the pure effect of different covariance matrix. In this thesis, we propose intuitively to use quadratic discriminant analysis (QDA) for the classification of data with rare and weak signals. In theoretical aspect, we first derive the detection boundary of QDA at population level, which separates the region of successful classification from the region of unsuccessful classification under the ideal case that the covariance matrix is known. When the covariance matrix is unknown, we then obtain a subregion where successful classification is impossible (for all classifiers) which also forms a subregion of unsuccessful classification region of QDA. For data of rare signals, variable selection will mostly improve the performance of statistical procedures. Thus in implementation aspect, we propose a variable selection procedure for QDA based on the Higher Criticism Thresholding (HCT) that was proved to be efficient for LDA in Fan et al. (2013). Finally, we conduct extensive simulation studies in order to demonstrate and explore the successful and unsuccessful classification regions of QDA and examine the effectiveness of the proposed HCT procedure.
Description
Keywords
high dimensional data, quadratic discriminant analysis (QDA), higher criticism, classification
Citation
Chen, H. (2019). QDA Classification for Two-Component Mixture with Data of Rare and Weak Signal (Master's thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.