Factor Analysis – Difference Between Non-Negative Matrix Factorization (NMF) and Factor Analysis (FA)

factor analysisnon-negative-matrix-factorization

I am performing an Exploratory Factor Analysis (EFA) for a multivariate dataset, where variables are all measurements of the same physical measure, only in different locations in space.
My purpose is to extract a few latent variables (i.e., factors) that can possibly be interpreted as common sources causing the observations, and then to use these factors for future analysis (after assigning each factor to a "source").

EFA works pretty well, but I can get also negative factor scores, which I am not sure are physical solutions. I came across PMF (Positive Matrix Factorization) or NMF/NNMF (Non-Negative Matrix Factorization) and was wondering if it makes sense to use it for my purpose as well.

What would be the difference between the two algorithms?

For instance, the decomposed matrix I get with NMF (W) is quite similar to the factor scores I get with EFA (only there is a certain stochastic behaviour).
Yet, I can't understand a few things:

  1. In EFA, the factor scores are normalized signals. What units is the decomposed W matrix? is it in the same units of my measurements?

  2. If running NMF with different seeds produces different solutions, what solution should I take? should I run it many times?

  3. If (1) is true, can I use the W matrix in the same way I intended to with my factor scores? (i.e., each column in W matrix would be the factor scores of a "common factor"?)

Would be great to get some help…

Best Answer

NMF/PMF are typically used to make low-rank decompositions. They can be used like a truncated SVD, just for dimension reduction. They can also be used like factor analysis, to attempt to identify latent variables that theory says underly the data.

A truncated rank-$k$ SVD asks for the best decomposition of the data matrix $X$ into $UDV^T$ where $U$ and $V$ have $k$ orthonormal columns and are chosen to minimise the sum of squared errors in reconstructing the elements of $X$. An approximate NMF decomposes $X$ as $GH^T$ where $G$ and $H$ have $k$ columns and all the entries are non-negative. There are also sparse NMF algorithms that (surprise!) additionally make the factors sparse.

One classic application of NMF/PMF is in analytic chemistry. For example, in particulate air pollution research, $X$ may be a matrix whose $(s,t)$ entry is the mass concentration of chemical species $s$ at measurement time $t$. The decomposition of rank $k$ corresponds to a model with $k$ sources of particles, with $G_{sk}$ being the percentage concentration of species $s$ in source $k$ and $H_{kt}$ the mass concentration of particles from source $k$ at time $t$. Clearly these will be non-negative. Ideally $G$ will be somewhat sparse -- you would like to measure species that are, if not unique to a source, at least specific to a group of sources

[Update: even in this application the interpretation of $G$ and $H$ does depend on how they are scaled. It's always true that $G$ is species-source information and $H$ is source-time information, but getting $H$ to be mass concentrations requires scaling the rows of $H$ to sum to total particle mass concentration]

PMF (at least, the software of that name) does a non-negative decomposition but optimises a user-specified weighted sum of squared errors in reconstruction, where the weights are based on assay error either (preferably) known previously or (typically) estimated from replicates. This is a harder problem computationally. The software also allows constraints on the estimated decomposition -- eg, that species $7$ is found only in source $3$, or that the concentration of species 2 in source 4 is greater than 5%.

In air pollution analysis PMF (especially) is often seen as estimating the true sources, the way factor analysis estimates latent variables. In some ways it does better than factor analysis, since the non-negativity constraints reduce the non-identifiability (rotational freedom) of factor analysis.

But you can run PMF/NMF on data without having any theoretical commitment to any specific model for latent variables, which would be undesirable for factor analysis. For example, NMF has been used in text mining for clustering documents without specifying cluster:word relationships in advance, and in the Netflix prize competition for clustering movies.