Solved – Correlation matrix is not positive definite… But why

correlationmatrix

As part of an analysis I am conducting (Structural Equation Modeling) the estimated correlation matrix among some variables ended up looking like this:

            space lstnng actvts prntst persnl intrct prgrmm
space       1.000
listening   0.599 1.000
activities  0.706 0.646  1.000 
parentstaff 0.702 0.459  0.653  1.000
personal    0.591 0.582  0.844  0.776  1.00
interaction 0.627 0.964  0.501  0.325  0.639  1.000   
programme   0.493 0.602  0.981  0.687  0.944  0.642  1.000

The thing is that if you eyeball it, there doesn't seem to be anything apparently wrong with it (like no correlations greater than 1 or -1. But if you request the eigenvalues of it

[1]  5.01377877  1.00744933  0.62602056  0.30393170  0.16671742  0.01317704 -0.13107483

There's a pretty big negative one. In order to purse some further analysis I need to know which variable (or which set of variables) are making it not positive definite.

My first approach was to see whether the identity:

$\left |cov(x,y) \right |\leq sd(x)sd(y)$

was violated for any element of the matrix but, as I mentioned previously, no correlation is larger than 1.

Then I thought about using the following known limits for the elements of a 3X3 correlation matrix. Where if $r_{12}$ and $r_{13}$ are known then $r_{23}$ must fall between:

$r_{12}r_{13}-\sqrt{(1-r_{12}^{2})(1-r_{13}^2)}\leq r_{23} \leq r_{12}r_{13}+\sqrt{(1-r_{12}^{2})(1-r_{13}^2)}$

But when I took all possible groups of 3 correlations (making sure the indices matched like the the above formula, of course) to see if any of them were outside those bounds, I noticed that they all fall within those theoretical bounds 🙁

I am now out of ideas as far as what to do. Does anyone have any insights here? Or is it impossible to test which variable (or sets of variables) are making the correlation matrix not positive definite?

Best Answer

After doing some more experimentation, reading ttnphns's link and Zachary's comment I believe we can consider this question solved. The fact of the matter is that (beyond simple cases where the correlation matrix is small and thus easy to probe), non-positive definiteness can arise because:

A pair of variables is suspect (so a correlation>1 kind of situation).
Sets of variables are suspect (so some variables are not respecting the bounds placed on them by the other ones).
ALL variables are suspect.

Much to my chagrin, I'll just have to accept that this effort was doomed from the beginning.

Related Solutions

Solved – Create positive-definite 3×3 covariance matrix given specified correlation values

To follow up on @cardinal's comment: your $x$, $y$, and $z$ define a $(3 \times 3)$ correlation matrix $R$. Since a correlation matrix also is a possible covariance matrix (of standardized variables), it has to be positive definite. This is the case if all eigenvalues are $> 0$. If $R$ is indeed positive definite, then all vectors $\boldsymbol{s}$ of variances (i.e., numbers $> 0$) will turn $\boldsymbol{R}$ into a positive definite covariance matrix $\boldsymbol{\Sigma} = \boldsymbol{D}_{s}^{1/2} \boldsymbol{R} \boldsymbol{D}_{s}^{1/2}$, where $\boldsymbol{D}_{s}^{1/2}$ is the square root of the diagonal matrix made from $\boldsymbol{s}$.

So just construct $R$ from $x, y, z$, and check if the eigenvalues are all $> 0$. If so, you're good, and you can transform any set of data to have a corresponding covariance matrix with arbitrary variances:

x <- 0.5
y <- 0.3                            # changing this to -0.6 makes it not pos.def.
z <- 0.4
R <- matrix(numeric(3*3), nrow=3)   # will be the correlation matrix
diag(R) <- 1                        # set diagonal to 1
R[upper.tri(R)] <- c(x, y, z)       # fill in x, y, z to upper right
R[lower.tri(R)] <- c(x, y, z)       # fill in x, y, z to lower left
eigen(R)$values                     # get eigenvalues to check if pos.def.

gives

[1] 1.8055810 0.7124457 0.4819732

So our $\boldsymbol{R}$ here is positive definite. Now construct the corresponding covariance matrix from arbitrary variances.

vars  <- c(4, 16, 9)                # the variances
Sigma <- diag(sqrt(vars)) %*% R %*% diag(sqrt(vars))

Generate some data matrix $\boldsymbol{X}$ that we will transform to later have exactly that covariance matrix.

library(mvtnorm)                    # for rmvnorm()
N  <- 100                           # number of simulated observations
mu <- c(1, 2, 3)                    # some arbitrary centroid
X  <- round(rmvnorm(n=N, mean=mu, sigma=Sigma))

To do that, we first orthonormalize matrix $\boldsymbol{X}$, giving matrix $\boldsymbol{Y}$ with covariance matrix $\boldsymbol{I}$ (identity).

orthGS <- function(X) {             # implement Gram-Schmidt algorithm
    Id <- diag(nrow(X))
    for(i in 2:ncol(X)) {
        A <- X[ , 1:(i-1), drop=FALSE]
        Q <- qr.Q(qr(A))
        P <- tcrossprod(Q)
        X[ , i] <- (Id-P) %*% X[ , i]
    }
    scale(X, center=FALSE, scale=sqrt(colSums(X^2)))
}

Xctr <- scale(X, center=TRUE, scale=FALSE)  # centered version of X
Y    <- orthGS(Xctr)                        # Y is orthonormal

Transform matrix $\boldsymbol{Y}$ to have covariance matrix $\boldsymbol{\Sigma}$ and centroid $\boldsymbol{\mu}$.

Edit: what's going on here: Do a spectral decomposition $\boldsymbol{\Sigma} = \boldsymbol{G} \boldsymbol{D} \boldsymbol{G}^{t}$, where $\boldsymbol{G}$ is the matrix of normalized eigenvectors of $\boldsymbol{\Sigma}$, and $\boldsymbol{D}$ is the corresponding matrix of eigenvalues. Now matrix $\boldsymbol{G} \boldsymbol{D}^{1/2} \boldsymbol{Y}$ has covariance matrix $\boldsymbol{G} \boldsymbol{D}^{1/2} Cov(\boldsymbol{Y}) \boldsymbol{D}^{1/2} \boldsymbol{G}^{t} = \boldsymbol{G} \boldsymbol{D} \boldsymbol{G}^{t} = \boldsymbol{\Sigma}$, as $Cov(\boldsymbol{Y}) = \boldsymbol{I}$.

eig    <- eigen(Sigma)
A      <- eig$vectors %*% sqrt(diag(eig$values))
XX1ctr <- t(A %*% t(Y)) * sqrt(nrow(Y))
XX1    <- sweep(XX1ctr, 2, mu, "+")         # move centroid to mu

Check that the correlation matrix is really $\boldsymbol{R}$.

> all.equal(cor(XX1), R)
[1] TRUE

For other purposes, the question might now be: How do I find a positive definite matrix that is "very similar" to a pre-specified one that is not positive definite. That I don't know.

Edit: corrected some square roots

Solved – “matrix is not positive definite” – even when highly correlated variables are removed

The best tool to resolve (multi-) collinearity is in my view the Cholesky-decomposition of the correlation/covariance matrix. The following example discusses even the case of collinearity, where none of the bivariate correlations are "extreme", because we have rank-reduction only over sets of more variables than only two.

If the correlation-matrix, say R, is positive definite, then all entries on the diagonal of the cholesky-factor, say L, are non-zero (aka machine-epsilon). Btw, to use this tool for the collinearity-detection it must be implemented as to allow zero-eigenvalues, don't know, whether, for instance, you can use SPSS for this.
The number of non-zero entries in the diagonal indicate the actual rank of the correlation-matrix. And because of the triangular structure of the L-matrix the variables above the first occuring diagonal zero form a partial set of variables which is of reduced-rank. However, there may be some variables in that block, which do not belong to that set. So to find the crucial subset which contains only the multicollinearity you do several recomputations of the cholesky-decomposition, where you reorder the variables such that you find the smallest possible subset, which shows rank-reduction - so this is an iterative procedure. (If needed, I'll show an example where I use my MatMate-program for the script, later).

Here is an example using random-data on 5 variables, say $x_1$ to $x_5$ which I configured, such that the correlation matrix is positive semidefinite (up to machine precision) because I made $x_5 = 2 \cdot x_2 + \sqrt 2 \cdot x_4 $ (and after that normed to unit-variance) and thus that subset of three variables make a collinear subspace (more exactly: we should call it "co-planar" since they are linearly dependent only in a plane). Here is the correlation-matrix R

;MatMate-Listing vom:06.03.2013 17:43:23
;============================================

R =        x1        x2        x3        x4        x5
------------------------------------------------------
x1        1.0000   -0.7506    0.2298   -0.8666    0.0952
x2       -0.7506    1.0000   -0.2696    0.4569    0.5355
x3        0.2298   -0.2696    1.0000    0.1890   -0.4407
x4       -0.8666    0.4569    0.1890    1.0000   -0.5066
x5        0.0952    0.5355   -0.4407   -0.5066    1.0000
------------------------------------------------------

and here the cholesky-factor / loadingsmatrix:

[20]     L = cholesky(R)

L=        f1        f2        f3        f4        f5
------------------------------------------------------
x1        1.0000     .         .         .         .    
x2       -0.7506    0.6607     .         .         .    
x3        0.2298   -0.1469    0.9621     .         .    
x4       -0.8666   -0.2930    0.3587    0.1856     .    
x5        0.0952    0.9186   -0.3406   -0.1762     .    
------------------------------------------------------

As we see that only 4 of 5 diagonal elements are non-zero (above machine-epsilon) we know, that the correlation matrix has rank 4 instead of 5 and we have collinearity. But we do not yet know, whether 4 variables are linearly dependent or whether we have possibly a rank reduced subspace of even smaller dimension. So we try iteratively the rotation to triangularity, where the order of the variables $x_1$ to $x_5$ is systematically altered to identify any possible smallest subset.

For instance, we make the last item "the first"

[22] l1=rot(L,"drei",5´1´2´3´4)

L1=       f1        f2        f3        f4        f5
------------------------------------------------------
x1        0.0952    0.9955     .         .         .    
x2        0.5355   -0.8053   0.2545      .         .    
x3       -0.4407    0.2730   0.7320     0.4421     .    
x4       -0.5066   -0.8221   0.2598      .         .    
x5        1.0000     .         .         .         .    
------------------------------------------------------

and we see that rank-reduction is already occuring if we ignore variable 3 - because the variables $x_1,x_2,x_4,x_5$ define already a 3-dimensional subspace (instead of a 4-dimensional one).

Now we proceed altering the order for the cholesky-decomposition (actually I do this by a column rotation with a "triangularity-criterion"):

[24] L1 = rot(L,"drei",5´4´1´2´3)

L1=       f1        f2        f3        f4        f5
------------------------------------------------------
x1        0.0952   -0.9492    0.3000     .         .    
x2        0.5355    0.8445     .         .         .    
x3       -0.4407   -0.0397    0.7803     .        0.4421
x4       -0.5066    0.8622     .         .         .    
x5        1.0000     .         .         .         .    
------------------------------------------------------

Now we're nearly done: the subset of $x_2,x_4,x_5$ forms a reduced subspace and to see more, we put them at "the top" of the cholesky-process:

[26] L1 = rot(L,"drei",5´4´2´1´3)

L1=       f1        f2        f3        f4        f5
------------------------------------------------------
x1        0.0952   -0.9492     .        0.3000     .    
x2        0.5355    0.8445     .         .         .    
x3       -0.4407   -0.0397     .        0.7803    0.4421
x4       -0.5066    0.8622     .         .         .    
x5        1.0000     .         .         .         .    
------------------------------------------------------

We see, that $x_1$ has a component outside of that reduced space, and $x_3$ has a further component outside of the rank 3 space, and are thus partly independent of that 2-dimensional subspace (which can thus be given the term "co-planarity"). We can now decide which of the three variables $x_2,x_4$ or $x_5$ can be removed to overcome the multi-collinearity problem.

If we would use some software which does not allow this flexible reordering "inside" the rotation-parameters/procedure, we would re-order the variables forming the correlation-matrix and would do the cholesky-decomposition to arrive at something like:

[26] L1 = cholesky(...) // something in your favorite software...

L1=       f1        f2        f3        f4        f5
------------------------------------------------------
x5        1.0000     .         .         .         .   Co-planar subset  
x2        0.5355    0.8445     .         .         .    
x4       -0.5066    0.8622     .         .         .    
------------------------------------------------------ 
x1        0.0952   -0.9492     .        0.3000     .      further linearly independent
x3       -0.4407   -0.0397     .        0.7803    0.4421  variables

[update]: Note that the candidates from which we would remove one, were not necessarily recognized by the inspection of correlations in the correlation-matrix. There the highest correlation is 0.8666 between $x_1$ and $x_4$ - but $x_1$ does not contribute to the rank-deficiency! Furthermore, the correlations between $x_2,x_4,x_5$ are all in an "acceptable" range when one wants to apply some jackknife-estimate for the removal of high-correlations assuming multicollinearity - one would not look at them as the most natural candidates from the set of bivariate correlations only.

Best Answer

Related Solutions

Solved – Create positive-definite 3×3 covariance matrix given specified correlation values

Solved – “matrix is not positive definite” – even when highly correlated variables are removed

Related Question