Solved – Numerical Instability of calculating inverse covariance matrix

clusteringcovariancedistance-functionsMATLABmatrix inverse

I have a 65 samples of 21-dimensional data (pasted here) and I am constructing the covariance matrix from it. When computed in C++ I get the covariance matrix pasted here. And when computed in matlab from the data (as shown below) I get the covariance matrix pasted here

Matlab code for computing cov from data:

data = csvread('path/to/data');
matlab_cov = cov(data);

As you can see the difference in covariance matrices are minute (~e-07), which is probably due to numerical problems in the compiler using floating point arithmetic.

However, when I compute the pseudo-inverse covariance matrix from the covariance matrix produced by matlab and the one produced by my C++ code, I get widely different results. I am computing them in the same way i.e.:

data = csvread('path/to/data');
matlab_cov = cov(data);
my_cov = csvread('path/to/cov_file');
matlab_inv = pinv(matlab_cov);
my_inv = pinv(my_cov);

The difference is so huge that when I am computing the mahalanobis distance from a sample (pasted here) to the distribution of the 65 samples by:

$(65/64^2) \times ((sample-mean)\times {\sum}^{-1} \times (sample-mean)')$

using the different inverse covariance matrices (${\sum}^{-1}$) I get widely different results i.e.:

 (65/(64^2))*((sample-sample_mean)*my_inv*(sample-sample_mean)')
ans =

   1.0167e+05

(65/(64^2))*((sample-sample_mean)*matlab_inv*(sample-sample_mean)')
ans =

  109.9612

Is it normal for the small (e-7) differences in covariance matrix to have such an effect on the computation of the pseudo-inverse matrix? And if so, what can I do to mitigate this effect?

Failing this, are there any other distance metrics I can use that do not involve the inverse covariance? I use the Mahalanobis distance as we know for n samples it follows a beta distribution, which I use for hypothesis testing

Many thanks in advance

EDIT: Adding C++ code for calculating covariance matrix below:
The vector<vector<double> > represents the collection of rows from the file pasted.

Mat covariance_matrix = Mat(21, 21, CV_32FC1, cv::Scalar(0));
    for(int j = 0; j < 21; j++){
        for(int k = 0; k < 21; k++){
            for(std::vector<vector<double> >::iterator it = data.begin(); it!= data.end(); it++){
                covariance_matrix.at<float>(j,k) += (it->at(j) - mean.at(j)) * (it->at(k) - mean[k]);
            }
            covariance_matrix.at<float>(j,k) /= 64; 
        }
    }

Best Answer

The matrices you are looking to invert are not "valid" covariances matrices because they are not positive definite; numerically they even have some eigenvalues that are negative (but close to zero). This most probably due to machine zeros, for example the last eigenvalue of your "matlab_covariance" matrix is -0.000000016313723. To correct to positive definite you can do two things :

  1. Just add some very small constant along the diagonal; a form of Tikhonov correction really ($Χ+\lambda I$ with $\lambda \rightarrow 0$).
  2. (Based on what whuber proposed) Use SVD, set the "problematic" eigenvalues to a fixed small value (not zero), reconstruct your covariance matrix and then invert that. Clearly if you set some of those eigenvalues to zero you will end up with a non-negative (or semi-positive) matrix, that will not be invertible still.

A non-negative matrix does not have a inverse but it does have a pseudo inverse (all matrices with real or complex entries have a pseudo-inverse), nevertheless the Moore–Penrose pseudo-inverse is more computationally expensive than a true inverse and if the inverse exists it is equal to the pseudo-inverse. So just go for the inverse :)

Both methods practically try to handle the eigenvalues that evaluate to zero (or below zero). The first method is bit hand-wavy but probably much faster to implement. For something slightly more stable you might want to compute the SVD and then set the $\lambda$ equal to absolute of the smallest eigenvalue (so you get non-negative) plus something very small (so you get positive). Just be careful not to enforce positivity to a matrix that is obviously negative (or already positive). Both methods will alter the conditioning number of your matrix.

In statistical terms what you do by adding that $\lambda$ across the diagonal of your covariance matrix you add noise to your measurements. (Because the diagonal of the covariance matrix is the variance of each point and by adding something to those values you just say "the variance at the points I have readings for is actually a bit bigger than what I thought originally".)

A fast test for the positive-definiteness of a matrix is the existence (or not) of the Cholesky decomposition of it.

Also as a computational note:

  1. Don't use floats, use doubles to store your covariance matrix.
  2. Use numerical linear algebra libraries in C++ (like Eigen or Armadillo) to get inverses of matrices, matrix products, etc. It's faster, safer and more concise.

EDIT: Given you have a Cholesky decomposition of your matrix $K$ such that $LL^T$ (you have to do that to check you are having a Pos.Def. matrix) you should be able to immediately solve the system $Kx=b$. You just solve Ly = b for y by forward substitution, and then L^Tx = y for x by back substitution. (In eigen just use the .solve(x) method of your Cholesky object) Thanks to bnaul and Zen for pointing out that I focused so much on getting the $K$ be Pos.Def. that I forgot why we cared about that in the first place :)