Solved – Evaluating matrix factorization algorithms for Netflix

algorithmsdata miningmatrix decompositionrecommender-system

I've been trying to implement Simon Funk's movie recommendation algorithm explained here. I understand how the user and item factors are computed. However the evaluation method is not clearly explained. I checked this post also but there was no evaluation method there.

My Question:
Let's assume we divide the the user-rating matrix R into training(95%) and testing data-sets(5%): R_train and R_test and we use Simon Funk's algorithm to find the user factor matrix (U) and movie factor matrix (M) using the training data matrix: R_train=U*M. How do we evaluate algorithm using R_test?

Best Answer

Tl;dr Frobenius norm, limited to known entries.

Let $P$ be the matrix of predictions (so $P_{u,m}$ is the predicted rating for user $u$ and movie $m$).

Let $R^{train}$ and $R^{test}$ be the training and test sets of data. For convenience, I'll use notation $R^{test}_{u,m}$ to mean the rating of user $u$ of movie $m$ in the test set, but note that $R^{test}$ is not a matrix, because it's sparse: 99% of the entries are missing. Most users have not rated most movies. Only a tiny number of $(u,m)$ pairs are actually present in $R^{test}$.

The error of prediction matrix $P$ is

$\displaystyle\sum_{(u,m) \in R^{test}} (R^{test}_{u,m} - P_{u,m})^2$

That is, the sum over the squared error of known entries in the test set.

# Pseudocode for calculating error p = ... # prediction matrix, indexed by [u][m] r_test = ... # list of (user, movie, rating) tuples error = 0.0 for (u, m, r) in r_test: error += (r - predictions[u][m])^2

Important note on splitting test/train set

When splitting up the data into training and test sets, you should randomly select (user, movie) pairs, not select random users or movies. The whole idea of "collaborative filtering" (for Netflix) is to predict ratings for movies you haven't watched based on the ratings you provided for ones you have. If a user is present only in the testing set, the model cannot possibly be basing predictions based on their other ratings.

I need more help on the Netflix problem and SVD

See my other answer for a description of the netflix problem and matrix-factoring approaches to it.