Solved – Understanding distance correlation computations

correlationdistance-covarianceindependenceintuition

As far as I understood, distance correlation is a robust and universal way to check if there is a relation between two numeric variables. For example, if we have a set of pairs of numbers:

(x1, y1)
(x2, y2)
...
(xn, yn)

we can use distance correlation to check if there is any (not necessarily linear) relation between the two variables (x and y). Moreover, x and y can be vectors of different dimensions.

It is relatively easy to calculate distance correlation. First we use $x_i$ to calculate distance matrix. Then we calculate distance matrix using $y_i$. The two distance matrices will have the same dimensions because the number of $x_i$ and $y_i$ is the same (because they come in pairs).

Now we have a lot of distances that can be paired. For example element (2,3) from the first distance matrix is paired with the element (2,3) from the second distance matrix. So, we have a set of pairs of distances and we can use it to calculate correlation (correlation between distances).

If two types of distances are correlated, than it means that close Xs usually mean close Ys. For example if $x_7$ is close to $x_{13}$ than it means that $y_7$ is likely to be close to $y_{13}$. So, we can conclude that Xs and Ys are dependent.

Sounds reasonable, however there are two aspects that I do not understand.

First, to calculate distance correlation we do not use the two distance matrices directly. We apply to them double centering procedure (so that sum of all elements in any row (or column) is equal to zero). I do not understand why we need to do it. What is the logic (or intuition) behind this step?

Second, in the original distance matrices we have zeros on the diagonal. So, if we calculate the correlations between the distances, we will have a statistically significant correlation just because many zeros from the first matrix are paired with the corresponding zeros in the second matrix. How is this problem resolved?

Best Answer

Distance covariance/correlation (= Brownian covariance/correlation) is computed in the following steps:

  1. Compute matrix of euclidean distances between N cases by variable $X$, and another likewise matrix by variable $Y$. Any of the two quantitative features, $X$ or $Y$, might be multivariate, not just univariate.
  2. Perform double centering of each matrix. See how double centering is usually done. However, in our case, when doing it do not square the distances initially and don't divide by $-2$ in the end. Row, column means and overall mean of the elements become zero.
  3. Multiply the two resultant matrices elementwise and compute the sum; or equivalently, unwrap the matrices into two column vectors and compute their summed cross-product.
  4. Average, dividing by the number of elements, N^2.
  5. Take square root. The result is the distance covariance between $X$ and $Y$.
  6. Distance variances are the distance covariances of $X$, of $Y$ with own selves, you compute them likewise, points 3-4-5.
  7. Distance correlation is obtained from the three numbers analogously how Pearson correlation is obtained from usual covariance and the pair of variances: divide the covariance by the sq. root of the product of two variances.

Distance covariance (and correlation) is not the covariance (or correlation) between the distances themselves. It is the covariance (correlation) between the special scalar products (dot products) which the "double centered" matrices are comprised of.

In euclidean space, a scalar product is the similarity univocally tied with the corresponding distance. If you have two points (vectors) you may express their closeness as scalar product instead of their distance without losing information.

However, to compute a scalar product you have to refer to the origin point of the space (vectors come from the origin). Generally, one could place the origin where he likes, but often and convenient is to place it at the geometric middle of the cloud of the points, the mean. Because the mean belongs to the same space as the one spanned by the cloud the dimensionality would not swell out.

Now, the usual double centering of the distance matrix (between the points of a cloud) is the operation of converting the distances to the scalar products while placing the origin at that geometric middle. In doing so the "network" of distances is equivalently replaced by the "burst" of vectors, of specific lengths and pairwise angles, from the origin:

enter image description here

[The constellation on my example picture is planar which gives away that the "variable", say it was $X$, having generated it was two-dimensional. When $X$ is a single-column variable all points lie on one line, of course.]

Just a bit formally about the double centering operation. Let have n points x p dimensions data $\bf X$ (in the univariate case, p=1). Let $\bf D$ be n x n matrix of euclidean distances between the n points. Let $\bf C$ be $\bf X$ with its columns centered. Then $\mathbf S = \text{double-centered } \mathbf D^2$ is equal to $\bf CC'$, the scalar products between rows after the cloud of points was centered. The principal property of the double centering is that $\frac{1}{2n} \mathbf {\sum D^2} = trace(\mathbf S) = trace(\mathbf {C'C})$, and this sum equals the negated sum of the off-diagonal elements of $\bf S$.

Return to distance correlation. What are we doing when we compute distance covariance? We have converted both nets of distances into their corresponding bunches of vectors. And then we compute the covariation (and subsequently the correlation) between the corresponding values of the two bunches: each scalar product value (former distance value) of one configuration is being multiplied by its corresponding one of the other configuration. That can be seen as (as was said in point 3) computing the usual covariance between two variables, after vectorizing the two matrices in those "variables".

Thus, we are covariating the two sets of similarities (the scalar products, which are the converted distances). Any sort of covariance is the cross-product of moments: you have to compute those moments, the deviations from the mean, first, - and the double centering was that computation. This is the answer to your question: a covariance needs to be based on moments but distances aren't moments.

Additional taking of square root after (point 5) seems logical because in our case the moment was already itself a sort of covariance (a scalar product and a covariance are compeers structurally) and so it came that you a kind of multiplyed covariances twice. Therefore in order to descend back on the level of the values of the original data (and to be able to compute correlation value) one has to take the root afterwards.

One important note should finally go. If we were doing double centering its classic way - that is, after squaring the euclidean distances - then we would end up with the distance covariance that is not true distance covariance and is not useful. It will appear degenerated into a quantity exactly related to the usual covariance (and distance correlation will be a function of linear Pearson correlation). What makes distance covariance/correlation unique and capable of measuring not linear association but a generic form of dependency, so that dCov=0 if and only if the variables are independent, - is the lack of squaring the distances when performing the double centering (see point 2). Actually, any power of the distances in the range $(0,2)$ would do, however, the standard form is do it on the power $1$. Why this power and not power $2$ facilitates the coefficient to become the measure of nonlinear interdependency is quite a tricky (for me) mathematical issue bearing of characteristic functions of distributions, and I would like to hear somebody more educated to explain here the mechanics of distance covariance/correlation with possibly simple words (I once attempted, unsuccessfully).