Solved – Why is non-negativity important for collaborative filtering/recommender systems

matrix decompositionnon-negative-matrix-factorizationrecommender-systemsvd

In all modern recommender systems that I have seen that rely on matrix factorization, a non-negative matrix factorization is performed on the user-movie matrix. I can understand why non-negativity is important for interpretability and/or if you want sparse factors. But if you only care about prediction performance, as in the netflix prize competition for example, why impose the non-negativity restriction? It would seem to be strictly worse than allowing negative values too in your factorization.

This paper is one highly cited example of the use of non-negative matrix factorization in collaborative filtering.

Best Answer

I am not a specialist in recommender systems, but as far I understand, the premise of this question is wrong.

Non-negativity is not that important for collaborative filtering.

The Netflix prize was won in 2009 by BellKor team. Here is the paper describing their algorithm: The BellKor 2008 Solution to the Netflix Prize. As is easy to see, they use an SVD-based approach:

The foundations of our progress during 2008 are laid out in the KDD 2008 paper [4]. [...] In the paper [4] we give a detailed description of three factor models. The first one is a simple SVD [...] The second model [...] we will refer to this model as “Asymmetric-SVD”. Finally, the more accurate factor model, to be named “SVD++” [...]

See also this more popular write-up by the same team Matrix factorization techniques for recommender systems. They talk a lot about SVD but do not mention NNMF at all.

See also this popular blog post Netflix Update: Try This at Home from 2006, also explaining SVD ideas.

Of course you are right and there is some work on using NNMF for collaborative filtering as well. So what works better, SVD or NNMF? I have no idea, but here is the conclusion of A Comparative Study of Collaborative Filtering Algorithms from 2012:

Matrix-Factorization-based methods generally have the highest accuracy. Specifically, regularized SVD, PMF and its variations perform best as far as MAE and RMSE, except in very sparse situations, where NMF performs the best.

Related Question