Score Tests for Complex Survey Samples and for Testing Gene-Phenotype Associations via Integrating Analysis of Predicted Expression from Multiple Tissues Open Access
Downloadable ContentDownload PDF
Abstract of DissertationScore Tests for Complex Survey Samples and for Testing Gene-Phenotype Associations via Integrating Analysis of Predicted Expression from Multiple TissuesTopic I: Quasi-generalized Score Tests for Complex SamplesSample surveys often have complex sample designs with multistage-cluster sampling, stratification, and differential selection probabilities. For generalization of complex sample to the finite population, these features of the design usually require sample weighting and proper methods for estimating variance-covariance incorporated in the analysis. Inference based on the Wald statistics can have limitations. Here generalized score test is pursued as an alternative to using a Wald test. Extensions of generalized score test for testing hypothesis involving general transformations and constraints haven’t been systematically developed for analyzing complex sample data yet. In this paper, we proposed quasi-generalized score test procedures regarding these hypothesis tests for complex samples. We estimate the unknown parameters by weighed estimating equations derived from using weighted pseudo-likelihood, incorporate sample weighting for the final test statistics, and derive consistent variance-covariance estimator for score vectors by Taylor linearization method which takes into account the design effects for complex sampling data. We illustrate the proposed tests with two applications, test a coefficient of variation and goodness-of-fit test for logistic regression. Monte Carlo simulation studies are used to investigate the type I error rates of the proposed tests, with various settings to simulate the intra-class correlations of stratified clusters and differential weighting for complex sample data. For majority of the settings, the simulation results showed that the proposed procedures worked well to control the size of the tests. We further illustrated our proposed methods with application on blood pressure data from the continuous National Health and Nutrition Examination Survey (cNHANES).Topic II: Integrative Analysis of Predicted Expression from Multiple Tissues to test Gene-Phenotype AssociationsMotivated by the fact that many genetic variants affect complex traits/diseases through gene expression, transcriptome-wide association study (TWAS) approaches are developed to improve understanding of the biological mechanisms underlying genome-wide association studies (GWAS). TWAS approach typically identifies phenotype associated genes by integrating GWAS data and expression quantitative trait loci (eQTL) data, where predicted gene expression variations in eQTL data are examined in a single tissue or through an agnostic scanning across tissues. Given that complex traits tend to be relevant with multiple tissues or cell types, here we proposed a general statistical framework that extends to two and multi-tissue TWAS via integrative analysis of predicted gene expression data from multi-tissue panels. Our algorithm is designed to accommodate GWAS summary data and conduct adaptive score test to analyzing jointly the gene-based associations from two and multi-tissue panels. This was implemented by modifying a procedure called adaptive rank truncated product (ARTP), originally developed by Yu et al (Genet Epidemiol, 2009, 33:700-709). Monte Carlo simulation studies are used to investigate the type I error rates of the proposed tests and find that the proposed algorithms work well in controlling empirical sizes. We further illustrate our methods with applications on height GWAS and renal cell carcinoma (RCC) GWAS by integrating with eQTL weights from 2 and 48 tissue panels, respectively. These procedures were included into an updated version of previous R-package and presently named as ARTP3.