Hypothesis Testing – How to Test Uniformity in Multiple Dimensions?

hypothesis testinguniform distribution

Testing for uniformity is something common, however I wonder what are the methods to do it for a multidimensional cloud of points.

Best Answer

It turns out that the question is more difficult than I thought. Still, I did my homework and after looking around, I found two methods in addition to Ripley's functions to test uniformity in several dimensions.

I made an R package called unf that implements both tests. You can download it from github at https://github.com/gui11aume/unf. A large part of it is in C so you will need to compile it on your machine with R CMD INSTALL unf. The articles on which the implementation is based are in pdf format in the package.

The first method comes from a reference mentioned by @Procrastinator (Testing multivariate uniformity and its applications, Liang et al., 2000) and allows to test uniformity on the unit hypercube only. The idea is to design discrepancy statistics that are asymptotically Gaussian by the Central Limit theorem. This allows to compute a $\chi^2$ statistic, which is the basis of the test.

library(unf)
set.seed(123)
# Put 20 points uniformally in the 5D hypercube.
x <- matrix(runif(100), ncol=20)
liang(x) # Outputs the p-value of the test.
[1] 0.9470392

The second approach is less conventional and uses minimum spanning trees. The initial work was performed by Friedman & Rafsky in 1979 (reference in the package) to test whether two multivariate samples come from the same distribution. The image below illustrates the principle.

uniformity

Points from two bivariate samples are plotted in red or blue, depending on their original sample (left panel). The minimum spanning tree of the pooled sample in two dimensions is computed (middle panel). This is the tree with minimum sum of edge lengths. The tree is decomposed in subtrees where all the points have the same labels (right panel).

In the figure below, I show a case where blue dots are aggregated, which reduces the number of trees at the end of the process, as you can see on the right panel. Friedman and Rafsky have computed the asymptotic distribution of the number of trees that one obtains in the process, which allows to perform a test.

non uniformity

This idea to create a general test for uniformity of a multivariate sample has been developed by Smith and Jain in 1984, and implemented by Ben Pfaff in C (reference in the package). The second sample is generated uniformly in the approximate convex hull of the first sample and the test of Friedman and Rafsky is performed on the two-sample pool.

The advantage of the method is that it tests uniformity on every convex multivariate shape and not only on the hypercube. The strong disadvantage, is that the test has a random component because the second sample is generated at random. Of course, one can repeat the test and average the results to get a reproducible answer, but this is not handy.

Continuing previous R session, here is how it goes.

pfaff(x) # Outputs the p-value of the test.
pfaff(x) # Most likely another p-value.

Feel free to copy/fork the code from github.