Solved – Covariance matrix of image data is not positive definite matrix

covariance-matrixeigenvaluesexpectation-maximizationmachine learningmathematical-statistics

I've really hit the wall here and need help with direction :).

I am trying to use mvnpdf as part of basic EM algorithm but the covariance matrix of data seems to be not positive definite. There are many discussion on this topic and I think I have followed all the major one’s without success. Also I want to understand how this problem should be dealt with real world data.


Data Description

I have face-image data (2800 images) where each image is 60x60x3 = 10800 variable vector. The pixel data is normalised between [0,1].


Following code Ive used to calculate covariance

% x is the 2800x10800 data matrix 
% I = 2800

dataset_mean = sum(x,1)./I;
sig = zeros (dimensionality, dimensionality);


for i = 1 : I
    mat = x (i,:) - dataset_mean;
    mat = mat' * mat;
    sig = sig + mat;

end
sig = sig ./ I;

Following points I have thought,

  1. [~,p] = chol(sig) gives me ~2800 hence because 'p' is positive it is definitely not Positive Definite. No idea why it always close to number of data points/rows.

  2. Tried, [R,err]=cholcov(s, 0); because I've read that it is the method used inside mvnpdf and I get err +ve indicating that it is Positive Definite

  3. Eigenvector has a lot of negative values but all of them are all very close to 0 (i.e. range of e-12). What I understand that it is numerical residual error of some sort hence I am using my own code to calculate covariance (above ) rather than an in-built function.

  4. The determinant of covariance matrix (sig) is also 0, as I understand that it means that the variable is the data matrix and too correlated. Am I right here?

  5. From point 3 and 4 I thought this is a case of Multivariate dependencies, where several variables together perfectly predict another variable.


Follow things I have tried,

  • To fix eigenvector problem –

1.Added small amount on the diagonal of covariance matrix.

2.Make all near zero -ve eigenvector 0 and reconstruct covariance matrix
Ref : http://comisef.wikidot.com/tutorial:repairingcorrelation

% compute eigenvectors/-values
[V,D]   = eig(C);
% replace negative eigenvalues by zero
D       = max(D, 0);
% reconstruct correlation matrix
BB      = V * D * V';
% rescale correlation matrix
T       = 1 ./ sqrt(diag(BB));
TT      = T * T';
C       = BB .* TT;
  1. Used SVD to reconstruct covariance matrix
  2. nearestSPD by John D’Errico
    Ref : https://in.mathworks.com/matlabcentral/fileexchange/42885-nearestspd

All above three methods gave Positive Definite matrix which I confirmed by using,
[R,err]=cholcov(s, 0); but determinant was still 0.

Additionally, mvnpdf gave me Inf value and I think it is because determinant of covariance matrix was 0 and it messed up the inverse calculation for covariance inside mvnpdf.

So, my question is that what do I need to do next, am I missing something? and I am sure this problem exists with real world data hence how is it tackled ?

Best Answer

Your covariance matrix isn't positive definite because you have more dimensions (i.e. pixels per image) than data points. Given $n$ points and $d$ dimensions, the rank of the sample covariance matrix (i.e. number of positive eigenvalues) is at most $\min \{n, d\}$. Since you have $n < d$, you have rank at most $n$. All remaining eigenvalues will be zero (i.e. the covariance matrix will be positive semidefinite). But, an eigensolver might report miniscule positive/negative values instead of true zeros due to numerical issues. This also means that the determinant (i.e. product of the eigenvalues) will be zero.

The rank of the covariance matrix indicates how many dimensions the data span. Geometrically, you have data that lie on a hyperplane of at most $n$ dimensions, embedded in a larger $d$ dimensional space. The eigenvalues of the covariance matrix indicate the variance of the data along the direction of the eigenvectors. Since you have many zero eigenvalues, this means the data is completely flat along these directions.

Because of this flatness, a Gaussian PDF doesn't exist when the covariance matrix isn't full rank, because it would be infinitesimally thin. This would be like trying to define a one-dimensional normal distribution with zero variance.

To work around this issue, you'd have to impose a prior on the covariance matrix that fattens out the flat directions. A common way to do this is to regularize the covariance matrix by adding some small value $\lambda$ to the diagonal: $C \leftarrow C + \lambda I$ (where $I$ is the identity matrix). This adds $\lambda$ to every eigenvalue, which has the effect of declaring that the variance is $\lambda$ along the direction of eigenvectors that formerly had zero variance.

A better option would be to apply dimensionality reduction (e.g. PCA), then simply work in the lower dimensional space (do you really want to deal with 10800 x 10800 covariance matrices?). Since you're working with face images, it's quite possible that your data has much lower dimensionality that $n$ (and if it doesn't, there may not be any hope of proceeding without gathering much more data).