I'm looking for some good terminology to describe what I'm trying to do, to make it easier to look for resources.
So, say I have two clusters of points A and B, each associated to two values, X and Y, and I want to measure the "distance" between A and B – i.e. how likely is it that they were sampled from the same distribution (I can assume that the distributions are normal). For example, if X and Y are correlated in A but not in B, the distributions are different.
Intuitively, I would get the covariance matrix of A, and then look at how likely each point in B is to fit in there, and vice-versa (probably using someting like Mahalanobis distance).
But that is a bit "ad-hoc", and there is probably a more rigorous way of describing this (of course, in practice I have more than two datasets with more than two variables – I'm trying to identify which of my datasets are outliers).
Thanks!
Best Answer
There is also the Kullback-Leibler divergence, which is related to the Hellinger Distance you mention above.