Methods for a Longitudinal Quantitative Outcome With a Multivariate Gaussian Mixture Distribution Multi-dimensionally Censored by Therapeutic Intervention Open Access
Downloadable ContentDownload PDF
In longitudinal clinical trials and in epidemiologic or genetic studies, the value of a quantitative outcome may be altered by the administration of a non-randomized, non-trial intervention during the period of observation. The resulting effect of the non-trial intervention may seriously distort the analysis and undermine the scientific aims of the study (Tobin et al. 2005). Current methods to address this issue are mainly for cross-sectional studies. For longitudinal data, the current available methods, including multi-level models (White et al. 2001), multiple imputation (MI) (Cook 1997, 2006), and a two-step approach (McClelland et al. 2008), are either restricted to a specific longitudinal data structure or are valid only under special circumstances. This dissertation proposes two classes of new methods for general longitudinal data. One uses a modified Expectation-Maximization (EM)-type algorithm and single deterministic imputation. The other uses a modified Monte Carlo EM (MCEM)-MI algorithm and multiple imputation. The former is a special case of the latter. Each class of methods can be implemented in three ways, yielding full-iteration, one-step and two-step algorithms. These methods can be used in clinical trials, epidemiologic, and genetic studies. They combine the advantages of the current methods while reducing their restrictive assumptions, and generalize them to more realistic scenarios. These methods extend both Cook's (1997) MI method for a restrictive longitudinal data structure to a more general longitudinal data set and Tobin et al.'s (2005) censored normal regression model for cross-sectional data to one for longitudinal data. The proposed methods replace the intractable calculation of a multi-dimensionally censored MVN posterior distribution with a simplified approximation yet maintaining sufficient accuracy. It avoids "the curse of dimensionality" (Cadez et al 2002) by avoiding complicated numerical integration. The proposed methods enjoy straightforward implementation using existing software in the M step. The proposed methods also converge fast, especially the modified EM-type algorithm, usually within 7 iterations. Simulation in this dissertation shows that similar to what Tobin et al (2005) showed in cross-sectional studies, in longitudinal studies, when the quantitative outcome is altered by a non-trial, non-randomized intervention, analysis without appropriate correction can lead to a substantial bias in the estimated treatment or exposure or association effects, seriously distort the analysis and undermine the scientific aims of the study. Simulation also shows that in a majority of simulated scenarios, at least one of the algorithms within the proposed two classes of methods have the least biased parameter estimates amongst six methods applied. In general, when the amount of imputation is small to moderate, the data is not very noisy, and there is a heterogeneous medication effect across the level of Y, a full iteration model usually gives the best estimate in this situation. In reality, most of the clinical trials, epidemiologic or genetic studies have <30% of therapeutic intervention (Masca et al. 2011) and most medications have a heterogeneous effect across Y (the higher the level of Y, the more the reduction). If data is not very noisy, a full-iteration model would be sufficient for most of the data.