Covariance Matrix – Is Every Covariance Matrix Positive Definite?

covariancecovariance-matrixlinear algebramatrix

I guess the answer should be yes, but I still feel something is not right. There should be some general results in the literature, could anyone help me?

Best Answer

No.

Consider three variables, $X$, $Y$ and $Z = X+Y$. Their covariance matrix, $M$, is not positive definite, since there's a vector $z$ ($= (1, 1, -1)'$) for which $z'Mz$ is not positive.

Population covariance matrices are positive semi-definite.

(See property 2 here.)

The same should generally apply to covariance matrices of complete samples (no missing values), since they can also be seen as a form of discrete population covariance.

However due to inexactness of floating point numerical computations, even algebraically positive definite cases might occasionally be computed to not be even positive semi-definite; good choice of algorithms can help with this.

More generally, sample covariance matrices - depending on how they deal with missing values in some variables - may or may not be positive semi-definite, even in theory. If pairwise deletion is used, for example, then there's no guarantee of positive semi-definiteness. Further, accumulated numerical error can cause sample covariance matrices that should be notionally positive semi-definite to fail to be.

Like so:

 x <- rnorm(30)
 y <- rnorm(30) - x/10 # it doesn't matter for this if x and y are correlated or not
 z <- x+y
 M <- cov(data.frame(x=x,y=y,z=z))
 z <- rbind(1,1,-1)
 t(z)%*%M%*%z
              [,1]
[1,] -1.110223e-16

This happened on the first example I tried (I probably should supply a seed but it's not so rare that you should have to try a lot of examples before you get one).

The result came out negative, even though it should be algebraically zero. A different set of numbers might yield a positive number or an "exact" zero.

--

Example of moderate missingness leading to loss of positive semidefiniteness via pairwise deletion:

z <- x + y + rnorm(30)/50  # same x and y as before.
xyz1 <- data.frame(x=x,y=y,z=z) # high correlation but definitely of full rank 

xyz1$x[sample(1:30,5)] <- NA   # make 5 x's missing  

xyz1$y[sample(1:30,5)] <- NA   # make 5 y's missing  

xyz1$z[sample(1:30,5)] <- NA   # make 5 z's missing  

cov(xyz1,use="pairwise")     # the individual pairwise covars are fine ...

           x          y        z
x  1.2107760 -0.2552947 1.255868
y -0.2552947  1.2728156 1.037446
z  1.2558683  1.0374456 2.367978

 chol(cov(xyz1,use="pairwise"))  # ... but leave the matrix not positive semi-definite

Error in chol.default(cov(xyz1, use = "pairwise")) : 
  the leading minor of order 3 is not positive definite

 chol(cov(xyz1,use="complete")) # but deleting even more rows leaves it PSD

          x          y          z
x 0.8760209 -0.2253484 0.64303448
y 0.0000000  1.1088741 1.11270078
z 0.0000000  0.0000000 0.01345364
Related Question