Solved – Evaluate performance of non-negative matrix factorization (NMF)

matrix decompositionnon-negative-matrix-factorizationunsupervised learning

I have a complex pipeline for predictive modeling of text, where the non-negative matrix factorization (NMF) is one part. I would like to evaluate the performance of the NMF independently of the neural network model that it is fed into afterwards. This means that I would like to evaluate the NMF in an unsupervised manner without any labels. In particular, I want to find a fitting value for the L1/L2 regularization term, alpha, in http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html.

It is important for me to optimize this regularization parameter as I want to use the NMF to remove noise in my dataset before feeding it into a classifier. What method/measure can I use to find the best performing value for alpha?

Best Answer

When dimensionality reduction techniques are part of larger pipeline, what really matters is how the reduction helps the end goal. If at all possible, I would try to see how different values of alpha affect the resulting predictions.

Since that may not always be feasible, a common way to evaluate dimensionality reduction is through reconstruction accuracy. For a simple example, one often selects the number of principal components by looking at the variance explained, which is equivalent to using the squared error of the reconstruction. There is usually no clear cut way to select the number of principal components, however, because the accuracy always increases with increasing principal components.

Similarly, decreasing alpha necessitates increases in reconstruction accuracy. A related option is to hold out a portion of the data (set to missing) and see how well the decomposition predicts the values of the held out data. It is then straight-forward to select the alpha that has the largest accuracy. For example, if the elements of your non-negative matrix are the counts of the number of times words are used in documents, you may randomly set a portion of the counts to missing and see how well the decomposition predicts the missing values for different values of alpha, choosing the alpha with the highest accuracy.

Related Question