Bayesian Variable Selection Model with Semicontinuous Response

dc.contributor.advisorChekouo, Thierry
dc.contributor.advisorSajobi, Tolulope
dc.contributor.authorBabatunde, Samuel
dc.contributor.committeememberZhang, Qingrun
dc.contributor.committeememberDeardon, Robert
dc.contributor.committeememberBezdek, Karoly
dc.date2022-01
dc.date.accessioned2022-01-18T16:02:06Z
dc.date.available2022-01-18T16:02:06Z
dc.date.issued2022-01-14
dc.description.abstractWe propose a novel Bayesian variable selection approach that identifies a set of features associated with a semicontinuous response. We used a two-part model where one of the models is a logit model that estimates the probability of zero responses while the other model is a log-normal model that estimates responses greater than zero (positive values). Stochastic Search Variable Selection (SSVS) procedure is used to randomly sample the indicator variables for variable selection which in turn searches the space of feature subsets and identifies the most promising features in the model. For the logistic model, a data augmentation approach is used to sample from the posterior density. We impose a spike-and-slab prior for the regression effects where the unselected covariates take on a prior mass at zero while the selected covariates follow a normal distribution (including the intercept and clinical covariates). Since the joint posterior density had no closed form, we employed the techniques of the Markov Chain Monte Carlo (MCMC) to sample from the posterior distribution. Simulation studies are used to assess the performance of the proposed method. We computed the average area under the receiver operating characteristic curve (AUC) to assess variable selection and compared it with competing methods. We also assessed the convergence diagnosis of our MCMC algorithm by computing the potential scale reduction factor and correlations between the marginal posterior probabilities. We finally apply our method to the coronary artery disease (CAD) data where the aim is to select important genes associated with the CAD index. This data consists of clinical covariates and gene expressions.en_US
dc.identifier.doihttp://dx.doi.org/10.11575/PRISM/39519
dc.identifier.urihttp://hdl.handle.net/1880/114304
dc.language.isoengen_US
dc.publisher.facultyScienceen_US
dc.publisher.institutionUniversity of Calgaryen
dc.rightsUniversity of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.en_US
dc.subjectBayesian variable selectionen_US
dc.subjectcoronary artery diseaseen_US
dc.subjectMarkov Chain Monte Carloen_US
dc.subjectStochastic Search Variable Selectionen_US
dc.subject.classificationBiostatisticsen_US
dc.subject.classificationStatisticsen_US
dc.titleBayesian Variable Selection Model with Semicontinuous Responseen_US
dc.typemaster thesisen_US
thesis.degree.disciplineMathematics & Statisticsen_US
thesis.degree.grantorUniversity of Calgaryen_US
thesis.degree.nameMaster of Science (MSc)en_US
ucalgary.item.requestcopytrueen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ucalgary_2022_babatunde_samuel.pdf
Size:
1.54 MB
Format:
Adobe Portable Document Format
Description:
Main article
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.62 KB
Format:
Item-specific license agreed upon to submission
Description: