Non-Negative Matrix Factorization – Choosing Optimal Number of Latent Factors

cross-validationlatent-variablematrix decompositionnon-negative-matrix-factorizationunsupervised learning

Given a matrix $\mathbf V^{m \times n}$, Non-negative Matrix Factorization (NMF) finds two non-negative matrices $\mathbf W^{m \times k}$ and $\mathbf H^{k \times n}$ (i.e. with all elements $\ge 0$) to represent the decomposed matrix as:

$$\mathbf V \approx \mathbf W\mathbf H,$$

for example by requiring that non-negative $\mathbf W$ and $\mathbf H$ minimize the reconstruction error $$\|\mathbf V-\mathbf W\mathbf H\|^2.$$

Are there common practices to estimate the number $k$ in NMF? How could, for example, cross validation be used for that purpose?

Best Answer

To choose an optimal number of latent factors in non-negative matrix factorization, use cross-validation.

As you wrote, the aim of NMF is to find low-dimensional $\mathbf W$ and $\mathbf H$ with all non-negative elements minimizing reconstruction error $\|\mathbf V-\mathbf W\mathbf H\|^2$. Imagine that we leave out one element of $\mathbf V$, e.g. $V_{ab}$, and perform NMF of the resulting matrix with one missing cell. This means finding $\mathbf W$ and $\mathbf H$ minimizing reconstruction error over all non-missing cells: $$\sum_ {ij\ne ab} (V_{ij}-[\mathbf W\mathbf H]_{ij})^2.$$

Once this is done, we can predict the left out element $V_{ab}$ by computing $[\mathbf W\mathbf H]_{ab}$ and calculate the prediction error $$e_{ab}=(V_{ab}-[\mathbf W\mathbf H]_{ab})^2.$$ One can repeat this procedure leaving out all elements $V_{ab}$ one at a time, and sum up the prediction errors over all $a$ and $b$. This will result in an overall PRESS value (predicted residual sum of squares) $E(k)=\sum_{ab}e_{ab}$ that will depend on $k$. Hopefully function $E(k)$ will have a minimum that can be used as an 'optimal' $k$.

Note that this can be computationally costly, because the NMF has to be repeated for each left out value, and might also be tricky to program (depending on how easy it is to perform NMF with missing values). In PCA one can get around this by leaving out full rows of $\mathbf V$ (which accelerates the computations a lot), see my reply in How to perform cross-validation for PCA to determine the number of principal components?, but this is not possible here.

Of course all the usual principles of cross-validation apply here, so one can leave out many cells at a time (instead of only a single one), and/or repeat the procedure for only some random cells instead of looping over all cells. Both approaches can help accelerating the process.

Edit (Mar 2019): See this very nice illustrated write-up by @AlexWilliams: http://alexhwilliams.info/itsneuronalblog/2018/02/26/crossval. Alex uses https://github.com/kimjingu/nonnegfac-python for NMF with missing values.

Related Question