Correlated Data Analysis via Variants of EM Algorithm: Application to Data on Physical Activity and Maternal Health

dc.contributor.advisorDe Leon, Alexander
dc.contributor.advisorLi, Haocheng
dc.contributor.authorLi, Jia
dc.contributor.committeememberWu, Jingjing
dc.contributor.committeememberLu, Xuewen
dc.contributor.committeememberChu, Man-Wai
dc.contributor.committeememberSheng, Xiaoming
dc.date2024-11
dc.date.accessioned2024-09-17T21:44:56Z
dc.date.available2024-09-17T21:44:56Z
dc.date.issued2024-09-13
dc.description.abstractThe thesis concerns the analysis of correlated data on multiple variables via the EM algorithm and its variants. Specifically, we focus on (cross-sectional) multivariate iid data comprising a disparate mix of binary and non-Gaussian variables (including the special case of multivariate binary data), and on longitudinal data on multiple Gaussian responses in a regression setting. For the case with correlated data on multiple binary variables and that with mixed data on binary and non-Gaussian continuous variables, we introduced the class of meta-probit (MPMs) and extended meta-probit models (XMPMs) as generalizations to non-Gaussian settings of the grouped continuous model (GCM) – also known as the multivariate probit model (MVPM) – and its extension to mixed data, the conditional GCM (CGCM). Con- structed from Gaussian copula distributions (GCDs), a class of meta-Gaussian distributions based on the Gaussian copula, MPMs and XMPMs broaden the sphere of applications of joint models to settings that involve complex non-standard data on variables with different measurement scales and with marginal distributions, latent and otherwise, from different parametric families. To avoid the computational challenges of maximum likelihood (ML) estimation in MPMs/XMPMs, we adopted the method of inference function for margins, a two-part estimation method that first estimates marginal parameters marginally via (marginal) ML estimation, and then estimates joint parameters (i.e., normal correlations) jointly via profile ML estimation based on the full joint likelihood function, with marginal parameters evaluated at their marginal estimates. The method is especially appropriate for copula models, in general, and MPMs/XMPMs, in particular, because marginal distributions are specified completely independently of their dependence structure in copula models. For joint estimation of the normal correlations, we adopted a parameter expanded EM (PX-EM) algorithm to simplify E-step calculations – all done numerically exactly using freely available R packages – and to make possible a closed-form M-step update, allowing us to avoid the complications associated with having to estimate a correlation matrix. We used the standard theory of inference functions to obtain the (joint) asymptotic Gaussian distribution of the resulting maximum pseudo-likelihood estimates (MPLEs). Results of Monte Carlo simulations confirmed the consistency and asymptotic unbiasedness of MPLEs, with SEs that generally reflected the estimates’ true sampling variability. Finally, we generalized the ECME algorithm to multiple-outcomes setting to implement ML estimation for the joint Gaussian LMMs with atypically large numbers of random effects. Monte Carlo simulations show that the resulting estimates are consistent, with comparable efficiencies with those obtained by pairwise methods. We further illustrate our methodology with longitudinal survey data on physical activity collected by ActivPALTM (www.paltech. plus.com).
dc.identifier.citationLi, J. (2024). Correlated data analysis via variants of EM algorithm: application to data on physical activity and maternal health (Doctoral thesis, University of Calgary, Calgary, Canada). Retrieved from https://prism.ucalgary.ca.
dc.identifier.urihttps://hdl.handle.net/1880/119743
dc.language.isoen
dc.publisher.facultyGraduate Studies
dc.publisher.institutionUniversity of Calgary
dc.rightsUniversity of Calgary graduate students retain copyright ownership and moral rights for their thesis. You may use this material in any way that is permitted by the Copyright Act or through licensing that has been assigned to the document. For uses that are not allowable under copyright legislation or licensing, you are required to seek permission.
dc.subjectCorrelated data analysis
dc.subjectGaussian Copula
dc.subjectPX-EM
dc.subjectIFM
dc.subjectMixed-effects model
dc.subjectECME
dc.subject.classificationEducation--Mathematics
dc.subject.classificationEducation--Sciences
dc.titleCorrelated Data Analysis via Variants of EM Algorithm: Application to Data on Physical Activity and Maternal Health
dc.typedoctoral thesis
thesis.degree.disciplineMathematics & Statistics
thesis.degree.grantorUniversity of Calgary
thesis.degree.nameDoctor of Philosophy (PhD)
ucalgary.thesis.accesssetbystudentI do not require a thesis withhold – my thesis will have open access and can be viewed and downloaded publicly as soon as possible.
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ucalgary_2024_li_jia.pdf
Size:
2.74 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.62 KB
Format:
Item-specific license agreed upon to submission
Description: