Longitudinal Weight Calibration with Estimated Control Totals for Cross Sectional Survey Data: Theory and Application Open Access
Downloadable ContentDownload PDF
The National Science Foundation (NSF) Survey of Doctorate Recipients (SDR) collects information on a sample of individuals in the United States with PhD degrees. A significant portion of the sampled individuals appear in multiple survey years and can be linked across time. Survey weights in each year are created and adjusted for oversampling and nonresponse on a cross-sectional basis. No longitudinal weight exists that would enable estimation of statistical models or comparison of finite population characteristics using data from multiple survey waves together. In the first topic of this dissertation, calibration estimation is applied for construction of such a longitudinal weight with the help of auxiliary variables in each survey year. Properties of the estimator and a variance estimator are given. Suggestions are made for addressing the non-existence of values in non-sampled survey years for some respondents and for analyzing longitudinal statistical models in a finite population with survey weights. Methods are studied through simulation and analysis of NSF SDR data. The method of creating longitudinal weights should be applicable to many overlapping panel surveys in addition to the NSF SDR.In some cases of survey weight calibration, the control totals are themselves estimated from an outside source with its own uncertainties. When the estimator uses survey weights produced through survey weight calibration to estimation control totals it is called a calibrated estimator with estimated controls (CEEC). Variance estimation for the CEEC is more complicated than for estimators adjusted for known totals. In the second topic of this dissertation, the effects of estimated control totals on the traditional variance estimator are evaluated. To address the additional source of uncertainty, several modified variance estimators for the CEEC are proposed and compared. This work generalizes Dever and Valliant (2010), in which they compared several variance estimators for post stratification to estimated totals. The exploration in the current work explores asymptotic characteristics of linearization and jackknife variance estimators in this context. Empirical properties and performance are studied for multi-stage survey designs through two-stage sampling survey simulations. Results from simulations as well as issues of variance estimation are presented and discussed. As with the first topic, methods for the CEEC and its variance estimation should have wide applicability in many surveys. The third topic of the dissertation combines the first two in a special situation. It examines the situation in which the estimated totals for use in calibration are from the survey itself. That is, the control totals are expressed in terms of estimates from previous time periods and even the current survey. In this case, the CEEC estimator calibrated to the estimated cross sectional totals is applied to a longitudinal survey with only estimated control totals. The fact that calibration is performed to match estimated totals within the same survey population means that a specific covariance term, previous treated as zero, must be estimated. Different methods of variance estimation are developed and compared. Performance of the CEEC and the variance estimators in this context are studied using new simulations. Methods are applied to data from the NSF SDR. The covariance between the NSF SDR surveys and the estimated totals in the variance estimation are a focus of this study. The conclusions of the work in this dissertation apply to a wide range of surveys: longitudinal surveys with cross sectional weights, cross sectional surveys with estimated control totals, and longitudinal surveys with control totals estimated across time. Future work could examine missing data, nonlinear models and calibration, computational issues including dimensionality and existence of calibration weight solutions, and methods to avoid extreme calibrationweights.