Non-Positive Definite Covariance Matrix – What It Reveals About Data

covariancemultivariate analysisnormal distribution

I have a number of multivariate observations and would like to evaluate the probability density across all variables. It is assumed that the data is normally distributed. At low numbers of variables everything works as I would expect, but moving to greater numbers results in the covariance matrix becoming non positive definite.

I have reduced the problem in Matlab to:

load raw_data.mat; % matrix number-of-values x number of variables
Sigma = cov(data);
[R,err] = cholcov(Sigma, 0); % Test for pos-def done in mvnpdf.

If err>0 then Sigma is not positive definite.

Is there anything that I can do in order to evaluate my experimental data at higher dimensions? Does it tell me anything useful about my data?

I'm somewhat of a beginner in this area so apologies if I've missed out something obvious.

Best Answer

The covariance matrix is not positive definite because it is singular. That means that at least one of your variables can be expressed as a linear combination of the others. You do not need all the variables as the value of at least one can be determined from a subset of the others. I would suggest adding variables sequentially and checking the covariance matrix at each step. If a new variable creates a singularity drop it and go on the the next one. Eventually you should have a subset of variables with a postive definite covariance matrix.

Related Question