Solved – How to get Bhattacharyya Distance in Excel (or Matlab, or R)

distributionsexcelhistogramrsimilarities

I have several histograms which I would like to compare to one reference histogram to see which one is the most similar to the reference in terms of the shape of the distribution.

Kologorov-Smirnov gives unhelpful answers as often the largest difference in y-value is not indicative of worst fit. From inspection-by-eye, the shape of some this method says to be the worst fit are actually the most similar in shape, but with the distribution shifted up or down the x-axis.

Chi squared: not sure how I'd implement this as my data are continuous (continuous numerical data on x-axis, frequency of these, binned, on y-axis), so offput by the fact that the number of bins I arbitrarily choose to include has such a large effect, and don't know about degrees of freedom.

I don't need a good P-value, just to know qualitatively which of a variety of histograms are most similar to an initial histogram. None of the histograms follow a normal distribution or an approximation of a normal distribution.

From reading, the Bhattacharyya distance seems like a good way of getting what I am looking for, but I don't know how to get this from my excel histograms and data.

A description of how someone with excel and very limited matlab skills might compute bhattacharyya distance, or any other suggestions of how to qualitatively say which of several histograms is most similar in terms of shape to a reference histogram would be greatly appreciated.

Best Answer

Use the fpc package: https://cran.r-project.org/web/packages/fpc/fpc.pdf

library(fpc)
bhattacharyya.dist(mu1, mu2, covarianceMatrix, covarianceMatrix2)

Can use cov to help calculate covariance.