I am conducting some research which involves visually/graphically observing the differences between the shapes of the distributions of different samples.
I would like to automate this process (at least somewhat), so that I can scale the number of samples I look at (as well as speeding things up, reducing human error etc.).
Is there a way to quantitatively describe/measure the shape of a distribution so that comparisons between shapes can be made algorithmically?
Best Answer
If the problem is uni-variate, then why not just do a KS test on the (centered, re scaled) vectors?
You can't use the associated
pvalues
(because the center and scale components have been determined by the data) but theD
statistics gives a relative measure of the distance between the two vectors (In a nutshell, it's simply the Chebychev distance between the two CDF).So, in
R
, it would be (assumingx
andy
are two vectors of potentially different lengths (each vector contains one of the sample whose shape of the distribution you want to compare).For example, if $x\sim\mathcal{P}(\lambda)$ and $y\sim\mathcal{N}(\mu,\sigma^2)$:
P.S. I left the original answer, because it seemed to be useful and frankly took me time to write. @Modo: let me know if it's better to remove it.