Novel Methodologies in Categorical Data Analysis Open Access
Downloadable ContentDownload PDF
Categorical response data is ubiquitous across virtually every applied field of statistics. In general, there are three ways in which categorical data are modeled. With standard classification, the aim is simply to predict the most likely category for a response on the basis of covariate data. A similar approach extends this paradigm by ranking the response categories by order of likelihood. Probabilistic classification offers the most general approach by assigning a probability estimate to each potential response category. The goal of this dissertation is twofold. First, we advance the theory and methods used in probabilistic classification when there is a single, categorical covariate from which to make inference. Distributional estimators are developed and evaluated through a frequentist framework in Chapter 1 and through a Bayesian lens in Chapter 2. In each framework, the estimators developed outperform their empirically-based counterparts by way of reducing the mean integrated squared error. The second aim of this dissertation is to model categorical response data in the field of baseball statistics or sabermetrics. In Chapter 3, we develop highly-tailored distributional estimates of batting outcomes conditional on a large, multivariate set of pitching predictors. These estimators, along with win-probability estimators developed in Chapter 4, are used to assign novel Wins Above Replacement scores to players in Major League Baseball.