Solved – How to one extract meaningful factors from a sparse matrix

factor analysismatrix decompositionpcar

I am interested in finding some practical (and reasonably well accepted) techniques for finding the underlying factors of a sparse matrix.

Specifically, I have a very large sparse matrix whose cells appear to be populated from an approximately geometric distribution. In its natural form the matrix is square. The values in the cells represent item x item co-occurrences under case 1 over the diagonal and under case 2 under the diagonal. If necessary I can subset the matrix to particularly interesting items in order to make it rectangular. I believe that there are meaningful factors underlying this structure. However my understanding is that because the matrix is sparse factor analysis is not an appropriate approach. What approach can I take that will make it most likely that I can find interpretable patterns in the data?

I saw that there was another question asking for references on sparse variants of PCA, but I think I'm looking for something more akin to an obliquely rotated factor solution. I'm willing to dig into suggested readings somewhat, but my prior experience with factor analysis (and related techniques) is limited, and I prefer a relatively straightforward answer (one with R code is even better).

Best Answer

I might suggest non-negative matrix factorization. The iterative algorithm of Lee and Seung is easy to implement and should be amenable to sparse matrices (although it involves Hadamard products, which some sparse matrix packages may not support.).