Solved – How to compute the correlation between two distance matrices

correlationdistancematrixstatistical significance

I have $N$ samples, each described by a vector of $M$ features. I can compute the symmetric distance matrix $D_1(i, j)$ representing the pairwise distances between the samples, using a distance measure. Now let's say that I change the conditions under whch the features are extracted, producing new feature vectors and thus a different matrix $D_2(i, j)$. I am interested in understanding if the differences between samples remained more or less the same (i.e. samples that were similar before are still similar to each other).

  • How can I compare $D_1$ and $D_2$? Would the canonical correlation a suitable measure?
  • Is there a statistical test to assess if the difference between $D_1$ and $D_2$ is significant?

Best Answer

Since your data matrices are symmetric, canonical correlation analysis(CCA) is not the right approach I think. CCA would look for linear combinations of distances that maximize correlations between the two sets. I would drop the correlation option.

The Procrustes distance may be a better option, since it measures the difference in shape of multidimensional ensembles. You could consider a resampling technique (such as the bootstrap) to test for significance, since I am not aware of any theoretical null distributions for the Procrustes distance.

Related Question