Solved – How to do factor analysis when the covariance matrix is not positive definite

covariancecovariance-matrixfactor analysisMATLAB

I have a data set that consists of 717 observations (rows) which are described by 33 variables (columns). The data are standardized by z-scoring all the variables. No two variables are linearly dependent ($r=1$). I've also removed all the variables with very low variance (less than $0.1$). The figure below shows the corresponding correlation matrix (in absolute values).

When I'm trying to run factor analysis using factoran in Matlab as follows:

[Loadings1,specVar1,T,stats] = factoran(Z2,1);

I receive the following error:

The data X must have a covariance matrix that is positive definite.

Could you please tell me where is the problem? Is it due to low mutual dependency among the used variables? In addition, what can I do about it?


My correlation matrix:

enter image description here

Best Answer

Let's define the correlation matrix by $C$. Since it is positive semi-definite, but not positive definite, its spectral decomposition looks something like $$C = Q D Q^{-1},$$ where the columns of $Q$ consist of orthonormal eigenvectors of $C$ and $$D = \begin{pmatrix}\lambda_1 & 0 & \cdots & \cdots &\cdots & \cdots& 0\\ 0 & \lambda_2 & \ddots & && &\vdots \\ \vdots & \ddots &\ddots & \ddots && &\vdots \\ \vdots & &\ddots &\lambda_n &\ddots &&\vdots \\ \vdots & & & \ddots &0 & \ddots& \vdots \\ \vdots & & & &\ddots & \ddots& 0\\ 0 & \cdots &\cdots & \cdots &\cdots & 0& 0\end{pmatrix}$$ is a diagonal matrix containing the eigenvalues corresponding to the eigenvectors in $Q$. Some of those are $0$. Moreover, $n$ is the rank of $C$.

A simple way to restore positive definiteness is setting the $0$-eigenvalues to some value that is numerically non-zero, e.g. $$\lambda_{n+1}, \lambda_{n+2},... = 10^{-15}.$$ Hence, set $$\tilde{C} = Q \tilde{D} Q^{-1},$$ where $$\tilde{D} = \begin{pmatrix}\lambda_1 & 0 & \cdots & \cdots &\cdots & \cdots& 0\\ 0 & \lambda_2 & \ddots & && &\vdots \\ \vdots & \ddots &\ddots & \ddots && &\vdots \\ \vdots & &\ddots &\lambda_n &\ddots &&\vdots \\ \vdots & & & \ddots &10^{-15} & \ddots& \vdots \\ \vdots & & & &\ddots & \ddots& 0\\ 0 & \cdots &\cdots & \cdots &\cdots & 0& 10^{-15}\end{pmatrix}$$ Then, perform the factor analysis for $\tilde{C}.

In Matlab, one can obtain $Q,D$ using the command:

[Q,D] = eig(C)

Constructing $\tilde{C}$ is then just simple Matrix manipulations.

Remark: It would be hard to tell how this influences the factor analysis though; hence, one should probably be careful with this method. Moreover, even though this is a $C$ is a correlation matrix, $\tilde{C}$ may well be not. Hence, another normalisation of the entries might be necessary.