Triangle Test and Triangle Data Depth in Nonparametric Multivariate Analysis Open Access
Downloadable ContentDownload PDF
A triangle statistic is proposed for testing the equality of two multivariate continuous distribution functions (DFs) in high-dimensional settings based on sample interpoint distances. Given two independent d-dimensional random samples, a triangle can be formed by randomly selecting one observation from one sample and two observations from the other sample. The triangle statistic estimates the probability that the distance between two observations from the same distribution is the largest, the middle or the smallest in the triangle formed by these three observations. We show that the test based on the triangle statistic is asymptotically distribution-free under the null hypothesis of equal and unknown continuous distribution functions. The triangle test is compared to other nonparametric tests through a simulation study. The appealing geometric nature of the triangle statistic motivates the development of a new data depth measure, called triangle data depth. The properties of theoretical triangle data depth function and its empirical analogue are explored. The sample triangle data depth enjoys computational simplicity in high dimensions compared to some existing depth functions. We also propose a multivariate analogue of the univariate median based on the triangle data depth. We show that the sample triangle median has a high breakdown point of 0.293 and good relative efficiency compared to the multivariate sample mean as the estimator for the center of a multivariate distribution. We explore the construction of Statistically Equivalent Blocks (SEBS), a multivariate generalization of univariate sample spacings, based on the notion of data depth (DSEBS), and their application for nonparametric multivariate analysis. DSEBS are data driven, center-outward layers of shells and the shapes of which reflect the underlying geometric features of the distribution. We propose a control quantile test based on DSEBS for testing the equality of two unknown continuous DFs in multivariate setting. The proposed test statistic is asymtotically distribution free under the null hypothesis. We conduct a simulation and show that the proposed test is powerful in detecting the differences in location, scale and shape (skewness or kurtosis) in two multivariate distributions.