Covariate Balancing Using Statistical Learning Methods in the Presence of Missingness in Confounders

Mason, Levi James

Covariate Balancing Using Statistical Learning Methods in the Presence of Missingness in Confounders

Files

ucalgary_2019_mason_levi.pdf (590.14 KB)

Date

2019-09-20

Authors

Mason, Levi James

Abstract

In observational studies researchers do not have control over treatment assignment. A consequence of such studies is that an imbalance in observed covariates between the treatment and control groups possibly exists. This imbalance can arise due to the fact that treatment assignment is frequently influenced by observed covariates (Austin, 2011a). As a result, directly comparing the outcomes between these two groups could lead to a biased estimation of the treatment effect (d’Agostino, 1998). The propensity score, defined as the probability of treatment assignment conditional on observed covariates, can be used in matching, stratification, and weighting to balance the observed covariates between the treatment and control groups in order to more accurately estimate the treatment effect (Rosenbaum and Rubin, 1983). This study looked at using statistical learning techniques to estimate the propensity score. The techniques included in this study were: logistic regression, classification and regression trees, pruned classification and regression trees, bagging classification and regression trees, boosted classification and regression trees, and random forests. These estimated propensity scores were then used in linearized propensity score matching, stratification, and inverse probability of treatment weighting using stabilized weights to estimate the treatment effect. Comparisons among these methods were made in a simulation study setting. Both a binary and continuous outcome were analyzed. In addition, a simulation was performed to assess the use of multiple imputation using predictive mean matching when a confounder had data missing at random. Based on the results from the simulation studies it was demonstrated that the most accurate treatment effect estimates came from inverse probability of treatment weighting using stabilized weights where the propensity scores were estimated by logistic regression, random forests, or bagging classification and regression trees. These results were then applied in a retrospective cohort data set with a missing confounder to determine the treatment effect of adjuvant radiation on breast cancer individuals.

Keywords

Statistical learning, Causal inference

Citation

Mason, L. J. (2019). Covariate Balancing Using Statistical Learning Methods in the Presence of Missingness in Confounders (Master's thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.

URI

http://hdl.handle.net/1880/111057

Collections

Open Theses and Dissertations

Full item page