Solved – What does a non positive definite covariance matrix tell me about the data

covariancemultivariate analysisnormal distribution

I have a number of multivariate observations and would like to evaluate the probability density across all variables. It is assumed that the data is normally distributed. At low numbers of variables everything works as I would expect, but moving to greater numbers results in the covariance matrix becoming non positive definite.

I have reduced the problem in Matlab to:

load raw_data.mat; % matrix number-of-values x number of variables
Sigma = cov(data);
[R,err] = cholcov(Sigma, 0); % Test for pos-def done in mvnpdf.

If err>0 then Sigma is not positive definite.

Is there anything that I can do in order to evaluate my experimental data at higher dimensions? Does it tell me anything useful about my data?

I'm somewhat of a beginner in this area so apologies if I've missed out something obvious.

Best Answer

The covariance matrix is not positive definite because it is singular. That means that at least one of your variables can be expressed as a linear combination of the others. You do not need all the variables as the value of at least one can be determined from a subset of the others. I would suggest adding variables sequentially and checking the covariance matrix at each step. If a new variable creates a singularity drop it and go on the the next one. Eventually you should have a subset of variables with a postive definite covariance matrix.

Related Solutions

Solved – Create positive-definite 3×3 covariance matrix given specified correlation values

To follow up on @cardinal's comment: your $x$, $y$, and $z$ define a $(3 \times 3)$ correlation matrix $R$. Since a correlation matrix also is a possible covariance matrix (of standardized variables), it has to be positive definite. This is the case if all eigenvalues are $> 0$. If $R$ is indeed positive definite, then all vectors $\boldsymbol{s}$ of variances (i.e., numbers $> 0$) will turn $\boldsymbol{R}$ into a positive definite covariance matrix $\boldsymbol{\Sigma} = \boldsymbol{D}_{s}^{1/2} \boldsymbol{R} \boldsymbol{D}_{s}^{1/2}$, where $\boldsymbol{D}_{s}^{1/2}$ is the square root of the diagonal matrix made from $\boldsymbol{s}$.

So just construct $R$ from $x, y, z$, and check if the eigenvalues are all $> 0$. If so, you're good, and you can transform any set of data to have a corresponding covariance matrix with arbitrary variances:

x <- 0.5
y <- 0.3                            # changing this to -0.6 makes it not pos.def.
z <- 0.4
R <- matrix(numeric(3*3), nrow=3)   # will be the correlation matrix
diag(R) <- 1                        # set diagonal to 1
R[upper.tri(R)] <- c(x, y, z)       # fill in x, y, z to upper right
R[lower.tri(R)] <- c(x, y, z)       # fill in x, y, z to lower left
eigen(R)$values                     # get eigenvalues to check if pos.def.

gives

[1] 1.8055810 0.7124457 0.4819732

So our $\boldsymbol{R}$ here is positive definite. Now construct the corresponding covariance matrix from arbitrary variances.

vars  <- c(4, 16, 9)                # the variances
Sigma <- diag(sqrt(vars)) %*% R %*% diag(sqrt(vars))

Generate some data matrix $\boldsymbol{X}$ that we will transform to later have exactly that covariance matrix.

library(mvtnorm)                    # for rmvnorm()
N  <- 100                           # number of simulated observations
mu <- c(1, 2, 3)                    # some arbitrary centroid
X  <- round(rmvnorm(n=N, mean=mu, sigma=Sigma))

To do that, we first orthonormalize matrix $\boldsymbol{X}$, giving matrix $\boldsymbol{Y}$ with covariance matrix $\boldsymbol{I}$ (identity).

orthGS <- function(X) {             # implement Gram-Schmidt algorithm
    Id <- diag(nrow(X))
    for(i in 2:ncol(X)) {
        A <- X[ , 1:(i-1), drop=FALSE]
        Q <- qr.Q(qr(A))
        P <- tcrossprod(Q)
        X[ , i] <- (Id-P) %*% X[ , i]
    }
    scale(X, center=FALSE, scale=sqrt(colSums(X^2)))
}

Xctr <- scale(X, center=TRUE, scale=FALSE)  # centered version of X
Y    <- orthGS(Xctr)                        # Y is orthonormal

Transform matrix $\boldsymbol{Y}$ to have covariance matrix $\boldsymbol{\Sigma}$ and centroid $\boldsymbol{\mu}$.

Edit: what's going on here: Do a spectral decomposition $\boldsymbol{\Sigma} = \boldsymbol{G} \boldsymbol{D} \boldsymbol{G}^{t}$, where $\boldsymbol{G}$ is the matrix of normalized eigenvectors of $\boldsymbol{\Sigma}$, and $\boldsymbol{D}$ is the corresponding matrix of eigenvalues. Now matrix $\boldsymbol{G} \boldsymbol{D}^{1/2} \boldsymbol{Y}$ has covariance matrix $\boldsymbol{G} \boldsymbol{D}^{1/2} Cov(\boldsymbol{Y}) \boldsymbol{D}^{1/2} \boldsymbol{G}^{t} = \boldsymbol{G} \boldsymbol{D} \boldsymbol{G}^{t} = \boldsymbol{\Sigma}$, as $Cov(\boldsymbol{Y}) = \boldsymbol{I}$.

eig    <- eigen(Sigma)
A      <- eig$vectors %*% sqrt(diag(eig$values))
XX1ctr <- t(A %*% t(Y)) * sqrt(nrow(Y))
XX1    <- sweep(XX1ctr, 2, mu, "+")         # move centroid to mu

Check that the correlation matrix is really $\boldsymbol{R}$.

> all.equal(cor(XX1), R)
[1] TRUE

For other purposes, the question might now be: How do I find a positive definite matrix that is "very similar" to a pre-specified one that is not positive definite. That I don't know.

Edit: corrected some square roots

Solved – Is a sample covariance matrix always symmetric and positive definite

For a sample of vectors $x_i=(x_{i1},\dots,x_{ik})^\top$, with $i=1,\dots,n$, the sample mean vector is $$ \bar{x}=\frac{1}{n} \sum_{i=1}^n x_i \, , $$ and the sample covariance matrix is $$ Q = \frac{1}{n} \sum_{i=1}^n (x_i-\bar{x})(x_i-\bar{x})^\top \, . $$ For a nonzero vector $y\in\mathbb{R}^k$, we have $$ y^\top Qy = y^\top\left(\frac{1}{n} \sum_{i=1}^n (x_i-\bar{x})(x_i-\bar{x})^\top\right) y $$ $$ = \frac{1}{n} \sum_{i=1}^n y^\top (x_i-\bar{x})(x_i-\bar{x})^\top y $$ $$ = \frac{1}{n} \sum_{i=1}^n \left( (x_i-\bar{x})^\top y \right)^2 \geq 0 \, . \quad (*) $$ Therefore, $Q$ is always positive semi-definite.

The additional condition for $Q$ to be positive definite was given in whuber's comment bellow. It goes as follows.

Define $z_i=(x_i-\bar{x})$, for $i=1,\dots,n$. For any nonzero $y\in\mathbb{R}^k$, $(*)$ is zero if and only if $z_i^\top y=0$, for each $i=1,\dots,n$. Suppose the set $\{z_1,\dots,z_n\}$ spans $\mathbb{R}^k$. Then, there are real numbers $\alpha_1,\dots,\alpha_n$ such that $y=\alpha_1 z_1 +\dots+\alpha_n z_n$. But then we have $y^\top y=\alpha_1 z_1^\top y + \dots +\alpha_n z_n^\top y=0$, yielding that $y=0$, a contradiction. Hence, if the $z_i$'s span $\mathbb{R}^k$, then $Q$ is positive definite. This condition is equivalent to $\mathrm{rank} [z_1 \dots z_n] = k$.

Best Answer

Related Solutions

Solved – Create positive-definite 3×3 covariance matrix given specified correlation values

Solved – Is a sample covariance matrix always symmetric and positive definite

Related Question