Solved – Correlation matrix is not positive definite… But why

correlationmatrix

As part of an analysis I am conducting (Structural Equation Modeling) the estimated correlation matrix among some variables ended up looking like this:

            space lstnng actvts prntst persnl intrct prgrmm
space       1.000
listening   0.599 1.000
activities  0.706 0.646  1.000 
parentstaff 0.702 0.459  0.653  1.000
personal    0.591 0.582  0.844  0.776  1.00
interaction 0.627 0.964  0.501  0.325  0.639  1.000   
programme   0.493 0.602  0.981  0.687  0.944  0.642  1.000 

The thing is that if you eyeball it, there doesn't seem to be anything apparently wrong with it (like no correlations greater than 1 or -1. But if you request the eigenvalues of it

[1]  5.01377877  1.00744933  0.62602056  0.30393170  0.16671742  0.01317704 -0.13107483

There's a pretty big negative one. In order to purse some further analysis I need to know which variable (or which set of variables) are making it not positive definite.

My first approach was to see whether the identity:

$\left |cov(x,y) \right |\leq sd(x)sd(y)$

was violated for any element of the matrix but, as I mentioned previously, no correlation is larger than 1.

Then I thought about using the following known limits for the elements of a 3X3 correlation matrix. Where if $r_{12}$ and $r_{13}$ are known then $r_{23}$ must fall between:

$r_{12}r_{13}-\sqrt{(1-r_{12}^{2})(1-r_{13}^2)}\leq r_{23} \leq r_{12}r_{13}+\sqrt{(1-r_{12}^{2})(1-r_{13}^2)}$

But when I took all possible groups of 3 correlations (making sure the indices matched like the the above formula, of course) to see if any of them were outside those bounds, I noticed that they all fall within those theoretical bounds 🙁

I am now out of ideas as far as what to do. Does anyone have any insights here? Or is it impossible to test which variable (or sets of variables) are making the correlation matrix not positive definite?

Best Answer

After doing some more experimentation, reading ttnphns's link and Zachary's comment I believe we can consider this question solved. The fact of the matter is that (beyond simple cases where the correlation matrix is small and thus easy to probe), non-positive definiteness can arise because:

  • A pair of variables is suspect (so a correlation>1 kind of situation).

  • Sets of variables are suspect (so some variables are not respecting the bounds placed on them by the other ones).

  • ALL variables are suspect.

Much to my chagrin, I'll just have to accept that this effort was doomed from the beginning.