Solved – Condition number of covariance matrix

computational-statisticscorrelationcovariance-matrixeigenvaluesmatrix

I am interested in generating a covariance matrix of dimension say 100. I managed to get a correlation matrix with finite condition number.

To construct a covariance matrix I need to have standard deviations. I think for my case the most suitable one is to generate standard deviations from gamma distribution.

So, it gives me small standard deviations as well as large standard deviations. As a result of that, the resulting covariance matrix has a very high condition number.

I want to know whether the condition number can be affected by the scale of the variables and if I want to incorporate different scales in the covariance matrix how can I get a covariance matrix with a reasonable condition number?

Any help or insight regarding this is highly appreciated.

Best Answer

Yes, the scales of your variables affect the condition number. This is a real phenomenon with practical consequences; for example, I am using linear least-squares to solve a fitting problem, and if I just drop in the appropriate columns my condition number is of order 10^18 (presumably worse, as this is the limit of my numerical precision). If on the other hand I rescale my variables so each column of the fit matrix has the same sum-of-squares amplitude, the condition number of the fit matrix drops to less than a hundred. If I use the ill-conditioned matrix to compute fit values, they and the residuals are terrible; if I use the rescaled matrix and then rescale the variables, I get good stable fits.

What this means in terms of correlation and covariance matrices is that if you want to work with differently-scaled variables, you should keep the individual variable scales separate from the correlation matrix. If you do this, then a bad condition number of the correlation matrix corresponds to real, strong correlations between your variables. If you construct a covariance matrix by multiplying the scales in, then indeed, you can get a bad condition number just because your variables have different scales.

You don't say exactly what you want to do with your generated covariance matrices. If you're trying to evaluate the performance of an algorithm, then you have revealed a shortcoming in that algorithm: it works better if you rescale all your variables first. If you're doing something else, well, the fact is that if your variables have different scales, the covariance matrices really will have horrible condition numbers.

Related Question