A semiparametric model for marker gene selection and acute leukemia classification

Date
2012-09-25
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Case-control studies are widely used in epidemiology to identify important indicators and factors. In areas such as marker gene selection and binary disease classification, the dominant approaches are either completely nonparametric, such as Wilcoxon test, k-nearest neighbors and support vector machines, or modified t-type statistics under parametric model (mostly Normal) assumptions, such as ANOVA, logistic regression and linear discriminant analysis. Parametric modeling possesses ease in implementation and inference making but is not robust to model assumptions, while a non-parametric approach makes no assumptions on underlying model but is deficient in interpretability. Comparatively, semiparametric modeling strikes a compromise between two aims - flexibility and simplicity of statistical procedures - by introducing partially parametric components. In this thesis, a two-sample semiparametric model is proposed to model the gene expression levels of acute lymphoblastic leukemia (ALL) patients and acute myeloid leukemia (AML) patients. For a data set (training data) containing 38 acute leukemia patients (27 ALL, 11 AML), both classical maximum likelihood (MLE) and minimum Hellinger distance (MHDE) estimation of the semi-parametric model are constructed and compared. Based on MHDE and MLE, Wald tests of significance are carried out to select marker genes. Further, using the idea of weighted sum of misclassification rates, new classification rules based on the selected marker genes are developed. The proposed classification rules are tested on both the training data and another independent validation leukemia data (20 ALL, 14 AML).
Description
Keywords
Biostatistics
Citation
Chen, G. (2012). A semiparametric model for marker gene selection and acute leukemia classification (Master's thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca. doi:10.11575/PRISM/25006