SPSS – Handling Factor Analysis Problems with Singular Covariance Matrix

factor analysispythonspss

I'm having trouble performing factor analysis on my dataset.

When I perform the factor analysis in SPSS (default settings), it works fine. Problem is, I need to do it programmatically (in Python). When I try using Python (MDP library) to do factor analysis on the same dataset, I get this error:

"The covariance matrix of the data is singular. Redundant dimensions need to be removed"

Upon looking into the MDP documentation, it says "…returns the Maximum A Posteriori estimate of the latent variables." Being a factor analysis newbie, I wasn't too clear on what this meant, but I tried changing the default extraction method in SPSS from "principal components" to "maximum likelihood". Then, in SPSS, I get the error:

"This matrix is not positive definite."

Are these two errors the same thing? Regardless, what can I do to fix my dataset so that the covariance matrix is not singular?

Thanks!

edit: OK, so I was trying to keep things simplified, but perhaps its better to just explain everything from the start.

I have a series of documents. Yes, I'm only using 9 documents as a simple test case, but my final objective will be to use it on a much larger corpus.

I've built a term-document matrix, performed tf-idf, and did SVD– mostly with the help of blog.josephwilk.net/…/latent-semantic-analysis-in-python.html

Now I have a reconstructed matrix, and I want to sort the documents into categories. So, I tried using factor analysis. In fact, it seems to work– when I put it in SPSS, the factor loadings indicate that the documents are grouped the way I thought they should be, and the loading are higher than if I hadn't performed SVD. (Although I think technically, SPSS is doing PCA even though its under the 'Factor Analysis' heading).

I tried using MDP's PCANode, but that doesn't seem to give me anything close to what I want. Strangely, if I transpose my matrix, the factor analysis does work (it will group the terms, instead of the documents).

Hopefully this all makes a little more sense now…

Best Answer

Yes, the two errors amount to the same thing. They're telling you (roughly) that two or more of your manifest variables are linearly dependent (like $y_1 = ay_2 + b$ for scalars $a, b$). These two variables (dimensions) would be "redundant", meaning that the sample covariance matrix is not invertible (ie is singular) and therefore not positive definite either.

As for what you ought to do about it, that depends. First I would try to find out which variables are giving you the trouble; a scatterplot matrix might be enough to tell you that. Then you can decide what to do from there - most likely dropping some redundant variables.

Related Question