Treatment Effect Models for Subgroup Analysis with Missing Data
Journal Title
Journal ISSN
Volume Title
The need for subgroup analysis in clinical trials in various contexts is increasing and data-driven approaches for subgroup identification based on statistical principles are desired. Among all subgroup identification methods, we focus on the treatment effect models that estimate the treatment contrast, since these models are intuitive and useful to interpretation. We evaluate and address the consequences of having missing data when using the Interaction Trees (IT), Qualitative Interaction Trees (QUINT) and Subgroup Identification based on Differential Effect Search (SIDES) methods. Simulation studies are used to demonstrate the accuracy of variable selection and bias in treatment effects when using complete, incomplete and imputed data across various scenarios when the sample size, proportion of missingness and imputation methods differ. We also applied these methods to a non-small cell lung cancer (NSCLC) dataset obtained from a retrospective study. Our results indicate that both IT and QUINT methods work equivalently well in most situations, while the SIDES results are, in general, less comparable due to the different mechanisms of the methods. The treatment effect models should be chosen based on the objective of the study, the sample size, the number of variables containing missing data, and the data structure. In terms of the methods for addressing missing data, an assumption of the data structure needs to be made during the method selection. MissForest is an excellent choice for a dataset with a tree-based structure, while MI methods would be a good fit for the other situations.