Solved – quantitative way to compare the distribution shape of different samples

data visualizationdistributionsnonparametric

I am conducting some research which involves visually/graphically observing the differences between the shapes of the distributions of different samples.

I would like to automate this process (at least somewhat), so that I can scale the number of samples I look at (as well as speeding things up, reducing human error etc.).

Is there a way to quantitatively describe/measure the shape of a distribution so that comparisons between shapes can be made algorithmically?

Best Answer

If the problem is uni-variate, then why not just do a KS test on the (centered, re scaled) vectors?

You can't use the associated pvalues (because the center and scale components have been determined by the data) but the D statistics gives a relative measure of the distance between the two vectors (In a nutshell, it's simply the Chebychev distance between the two CDF).

So, in R, it would be (assuming x and y are two vectors of potentially different lengths (each vector contains one of the sample whose shape of the distribution you want to compare).

For example, if $x\sim\mathcal{P}(\lambda)$ and $y\sim\mathcal{N}(\mu,\sigma^2)$:

#two distributions with different shape
y<-rnorm(100,0,3)
x<-rpois(100,1)
x_s<-(x-median(x))/mad(x)
y_s<-(y-median(y))/mad(y)
par(mfrow=c(2,1))
hist(y_s)
hist(x_s)
ks.test(x_s,y_s)

P.S. I left the original answer, because it seemed to be useful and frankly took me time to write. @Modo: let me know if it's better to remove it.

Related Question