Solved – Covariance and correlation matrix comparison

correlationcovariancedistance-functionsr

I am aware that this question may be too broad and that answers are scattered in various posts, but i need concise and organized answer.

My dataset consists of linear measurements of cranial dimensions on 600 individual roe deer (50 distinct measurements with a dial caliper). I divide this dataset to unequal groups (corresponding to population membership), and caluclate correlation or covariance matrix so that every population is represented by 50×50 matrix.

My question is, what is the best way to compare those matrices, both for equality and pattern (excluding Mantel test)? Problematic part may be the fact that those matrices are rarely of full rank since many measured characters are significantly correlated. Also, that comparison should include some kind of confidence intervals.

Edit:

In the meantime I have found one possible solution, I just need to implement it in R code. This paper suggests possible distances based comparisons that I really need.

The Euclidean distance as a simple method like this (Si is a sample covariance matrix):

$d_e(S_1,S_2) = \sqrt {tr((S_1-S_2)^t(S_1-S_2))}$

which I implemented in R code like this (although unsure):

covDif <- sqrt(t(cov(malesMab)-cov(malesMbm))*(cov(malesMab)-cov(malesMbm)))
sqrt(sum(diag(cov(covDif))))

But this distance is not good for comparison and of all the distances suggested in the mentioned paper the Cholesky decomposition is the best but I don`t know how to program it in R. This is its form:

$d_e(S_1,S_2) = chol(S_1)-chol(S_2)$

which I tried like this (just substituting in the upper equation)

covDif <- sqrt(t(chol(cov(malesMab))-chol(cov(malesMbm)))*(chol(cov(malesMab))-chol(cov(malesMbm))))
sqrt(sum(diag(cov(covDif))))

which works, but encounters rank deficiency problems which I hoped to avoid by using Cholesky decomposition.

Any suggestions?

Best Answer

Have you tried using the morphometric approaches of Strauss & Bookstein (1982)? It seems like this may give you a relatively straightforward way to compare your populations. Here's a really brief summary, but there's much more in the paper and other morphometric publications.

  1. If necessary, log-transform the 50 measurements ("dimensions")
  2. PCA of these dimensions (variables)
    • (note) PC 1 will likely explain almost all of the variance in the dimension data, and it mostly reflects overall size, so...
  3. Regressions of each dimension and PC 1
  4. Residuals of each regression may be used in the construction of discriminant model for DA based on pre-assigned groups (populations)
  5. Use resubstitution error rates to assess morphometric differences between populations
  6. MANOVA/ANOVA on regression residual data for both additional assessment of population differences and to identify specific dimensions that differ
    • (note) you may want to be careful even if MANOVA results indicate real differences due to the sheer number of ANOVA

Strauss, R. E. and F. L. Bookstein. 1982. The truss: Body form reconstructions in morphometrics. Systematic Biology 31:113–135.