Correlated Data Analysis via Variants of EM Algorithm: Application to Data on Physical Activity and Maternal Health
dc.contributor.advisor | De Leon, Alexander | |
dc.contributor.advisor | Li, Haocheng | |
dc.contributor.author | Li, Jia | |
dc.contributor.committeemember | Wu, Jingjing | |
dc.contributor.committeemember | Lu, Xuewen | |
dc.contributor.committeemember | Chu, Man-Wai | |
dc.contributor.committeemember | Sheng, Xiaoming | |
dc.date | 2024-11 | |
dc.date.accessioned | 2024-09-17T21:44:56Z | |
dc.date.available | 2024-09-17T21:44:56Z | |
dc.date.issued | 2024-09-13 | |
dc.description.abstract | The thesis concerns the analysis of correlated data on multiple variables via the EM algorithm and its variants. Specifically, we focus on (cross-sectional) multivariate iid data comprising a disparate mix of binary and non-Gaussian variables (including the special case of multivariate binary data), and on longitudinal data on multiple Gaussian responses in a regression setting. For the case with correlated data on multiple binary variables and that with mixed data on binary and non-Gaussian continuous variables, we introduced the class of meta-probit (MPMs) and extended meta-probit models (XMPMs) as generalizations to non-Gaussian settings of the grouped continuous model (GCM) – also known as the multivariate probit model (MVPM) – and its extension to mixed data, the conditional GCM (CGCM). Con- structed from Gaussian copula distributions (GCDs), a class of meta-Gaussian distributions based on the Gaussian copula, MPMs and XMPMs broaden the sphere of applications of joint models to settings that involve complex non-standard data on variables with different measurement scales and with marginal distributions, latent and otherwise, from different parametric families. To avoid the computational challenges of maximum likelihood (ML) estimation in MPMs/XMPMs, we adopted the method of inference function for margins, a two-part estimation method that first estimates marginal parameters marginally via (marginal) ML estimation, and then estimates joint parameters (i.e., normal correlations) jointly via profile ML estimation based on the full joint likelihood function, with marginal parameters evaluated at their marginal estimates. The method is especially appropriate for copula models, in general, and MPMs/XMPMs, in particular, because marginal distributions are specified completely independently of their dependence structure in copula models. For joint estimation of the normal correlations, we adopted a parameter expanded EM (PX-EM) algorithm to simplify E-step calculations – all done numerically exactly using freely available R packages – and to make possible a closed-form M-step update, allowing us to avoid the complications associated with having to estimate a correlation matrix. We used the standard theory of inference functions to obtain the (joint) asymptotic Gaussian distribution of the resulting maximum pseudo-likelihood estimates (MPLEs). Results of Monte Carlo simulations confirmed the consistency and asymptotic unbiasedness of MPLEs, with SEs that generally reflected the estimates’ true sampling variability. Finally, we generalized the ECME algorithm to multiple-outcomes setting to implement ML estimation for the joint Gaussian LMMs with atypically large numbers of random effects. Monte Carlo simulations show that the resulting estimates are consistent, with comparable efficiencies with those obtained by pairwise methods. We further illustrate our methodology with longitudinal survey data on physical activity collected by ActivPALTM (www.paltech. plus.com). | |
dc.identifier.citation | Li, J. (2024). Correlated data analysis via variants of EM algorithm: application to data on physical activity and maternal health (Doctoral thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca. | |
dc.identifier.uri | https://hdl.handle.net/1880/119743 | |
dc.language.iso | en | |
dc.publisher.faculty | Graduate Studies | |
dc.publisher.institution | University of Calgary | |
dc.rights | University of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission. | |
dc.subject | Correlated data analysis | |
dc.subject | Gaussian Copula | |
dc.subject | PX-EM | |
dc.subject | IFM | |
dc.subject | Mixed-effects model | |
dc.subject | ECME | |
dc.subject.classification | Education--Mathematics | |
dc.subject.classification | Education--Sciences | |
dc.title | Correlated Data Analysis via Variants of EM Algorithm: Application to Data on Physical Activity and Maternal Health | |
dc.type | doctoral thesis | |
thesis.degree.discipline | Mathematics & Statistics | |
thesis.degree.grantor | University of Calgary | |
thesis.degree.name | Doctor of Philosophy (PhD) | |
ucalgary.thesis.accesssetbystudent | I do not require a thesis withhold – my thesis will have open access and can be viewed and downloaded publicly as soon as possible. |