Solved – SVD of a matrix with missing values

recommender-systemsvd

Suppose I have a Netflix-style recommendation matrix, and I want to build a model that predicts potential future movie ratings for a given user. Using Simon Funk's approach, one would use stochastic gradient descent to minimize the Frobenius norm between the full matrix and the item-by-item * user-by-user matrix combined with an L2 regularization term.

In practice, what do people do with the missing values from the recommendation matrix, which is the whole point of doing the calculation? My guess from reading Simon's blog post is that he ONLY uses the non-missing terms (which consist of (say) ~1% of the recommendation matrix) to build a model (with some judicious choice of hyper-parameters and regularization) to predict the other 99% of the matrix?

In practice, do you really skip all those values? Or do you infer as much as possible BEFORE doing stochastic gradient descent? What are some of the standard best practices for dealing with the missing values?

Best Answer

Yes, in practice those values are skipped. In your description in terms of a Frobenius norm, this corresponds to minimising the components of the norm which can be measured, i.e. those which have known ratings. The regularisation term can be seen as a Bayesian prior on the components of the feature vectors, with the SVD calculating the maximum likelihood estimator, subject to this prior and the known values.

It's probably best to think of the SVD as a method for inferring the missing values. If you've already got a better way of doing this, why do you need the SVD? If you don't, then the SVD will happily fill in the gaps for you.