Singular Matrix – What Correlation Causes Singularity and Its Implications

correlationmatrixmulticollinearityregressionsingular

I am doing some calculations on different matrices (mainly in logistic regression) and I commonly get the error "Matrix is singular", where I have to go back and remove the correlated variables. My question here is what would you consider a "highly" correlated matrix? Is there a threshold value of correlation to represent this word? Like if a variable was 0.97 correlated to another one, is this a "high" enough to make a matrix singular?

Apologies if the question is very basic, I wasn't able to find any references talking about this issue (a hint towards any reference would be a big plus!).

Best Answer

What is singular matrix?

A square matrix is singular, that is, its determinant is zero, if it contains rows or columns which are proportionally interrelated; in other words, one or more of its rows (columns) is exactly expressible as a linear combination of all or some other its rows (columns), the combination being without a constant term.

Imagine, for example, a $3 \times 3$ matrix $A$ - symmetric, like correlaton matrix, or asymmetric. If in terms of its entries it appears that $\text {col}_3 = 2.15 \cdot \text {col}_1$ for example, then the matrix $A$ is singular. If, as another example, its $\text{row}_2 = 1.6 \cdot \text{row}_1 - 4 \cdot \text{row}_3$, then $A$ is again singular. As a particular case, if any row contains just zeros, the matrix is also singular because any column then is a linear combination of the other columns. In general, if any row (column) of a square matrix is a weighted sum of the other rows (columns), then any of the latter is also a weighted sum of the other rows (columns).

Singular or near-singular matrix is often referred to as "ill-conditioned" matrix because it delivers problems in many statistical data analyses.

What data produce singular correlation matrix of variables?

What must multivariate data look like in order for its correlation or covariance matrix to be a singular matrix as described above? It is when there is linear interdependances among the variables. If some variable is an exact linear combination of the other variables, with constant term allowed, the correlation and covariance matrces of the variables will be singular. The dependency observed in such matrix between its columns is actually that same dependency as the dependency between the variables in the data observed after the variables have been centered (their means brought to 0) or standardized (if we mean correlation rather than covariance matrix).

Some frequent particular situations when the correlation/covariance matrix of variables is singular: (1) Number of variables is equal or greater than the number of cases; (2) Two or more variables sum up to a constant; (3) Two variables are identical or differ merely in mean (level) or variance (scale).

Also, duplicating observations in a dataset will lead the matrix towards singularity. The more times you clone a case the closer is singularity. So, when doing some sort of imputation of missing values it is always beneficial (from both statistical and mathematical view) to add some noise to the imputed data.

Singularity as geometric collinearity

In geometrical viewpoint, singularity is (multi)collinearity (or "complanarity"): variables displayed as vectors (arrows) in space lie in the space of dimentionality lesser than the number of variables - in a reduced space. (That dimensionality is known as the rank of the matrix; it is equal to the number of non-zero eigenvalues of the matrix.)

In a more distant or "transcendental" geometrical view, singularity or zero-definiteness (presense of zero eigenvalue) is the bending point between positive definiteness and non-positive definiteness of a matrix. When some of the vectors-variables (which is the correlation/covariance matrix) "go beyond" lying even in the reduced euclidean space - so that they cannot "converge in" or "perfectly span" euclidean space anymore, non-positive definiteness appears, i.e. some eigenvalues of the correlation matrix become negative. (See about non-positive definite matrix, aka non-gramian here.) Non-positive definite matrix is also "ill-conditioned" for some kinds of statistical analysis.

Collinearity in regression: a geometric explanation and implications

The first picture below shows a normal regression situation with two predictors (we'll speek of linear regression). The picture is copied from here where it is explained in more details. In short, moderately correlated (= having acute angle between them) predictors $X_1$ and $X_2$ span 2-dimesional space "plane X". The dependent variable $Y$ is projected onto it orthogonally, leaving the predicted variable $Y'$ and the residuals with st. deviation equal to the length of $e$. R-square of the regression is the angle between $Y$ and $Y'$, and the two regression coefficients are directly related to the skew coordinates $b_1$ and $b_2$, respectively.

enter image description here

The picture below shows regression situation with completely collinear predictors. $X_1$ and $X_2$ correlate perfectly and therefore these two vectors coincide and form the line, a 1-dimensional space. This is a reduced space. Mathematically though, plane X must exist in order to solve regression with two predictors, - but the plane is not defined anymore, alas. Fortunately, if we drop any one of the two collinear predictors out of analysis the regression is then simply solved because one-predictor regression needs one-dimensional predictor space. We see prediction $Y'$ and error $e$ of that (one-predictor) regression, drawn on the picture. There exist other approaches as well, besides dropping variables, to get rid of collinearity.

enter image description here

The final picture below displays a situation with nearly collinear predictors. This situation is different and a bit more complex and nasty. $X_1$ and $X_2$ (both shown again in blue) tightly correlate and thence almost coincide. But there is still a tiny angle between, and because of the non-zero angle, plane X is defined (this plane on the picture looks like the plane on the first picture). So, mathematically there is no problem to solve the regression. The problem which arises here is a statistical one.

enter image description here

Usually we do regression to infer about the R-square and the coefficients in the population. From sample to sample, data varies a bit. So, if we took another sample, the juxtaposition of the two predictor vectors would change slightly, which is normal. Not "normal" is that under near collinearity it leads to devastating consequences. Imagine that $X_1$ deviated just a little down, beyond plane X - as shown by grey vector. Because the angle between the two predictors was so small, plane X which will come through $X_2$ and through that drifted $X_1$ will drastically diverge from old plane X. Thus, because $X_1$ and $X_2$ are so much correlated we expect very different plane X in different samples from the same population. As plane X is different, predictions, R-square, residuals, coefficients - everything become different, too. It is well seen on the picture, where plane X swung somewhere 40 degrees. In a situation like that, estimates (coefficients, R-square etc.) are very unreliable which fact is expressed by their huge standard errors. And in contrast, with predictors far from collinear, estimates are reliable because the space spanned by the predictors is robust to those sampling fluctuations of data.

Collinearity as a function of the whole matrix

Even a high correlation between two variables, if it is below 1, doesn't necessarily make the whole correlation matrix singular; it depends on the rest correlations as well. For example this correlation matrix:

1.000     .990     .200
 .990    1.000     .100
 .200     .100    1.000

has determinant .00950 which is yet enough different from 0 to be considered eligible in many statistical analyses. But this matrix:

1.000     .990     .239
 .990    1.000     .100
 .239     .100    1.000

has determinant .00010, a degree closer to 0.

Collinearity diagnostics: further reading

Statistical data analyses, such as regressions, incorporate special indices and tools to detect collinearity strong enough to consider dropping some of the variables or cases from the analysis, or to undertake other healing means. Please search (including this site) for "collinearity diagnostics", "multicollinearity", "singularity/collinearity tolerance", "condition indices", "variance decomposition proportions", "variance inflation factors (VIF)".

Related Question