Solved – Collaborative Filtering for Implicit Feedback Datasets

machine learningrecommender-system

I'm building a recommendation engine using ALS, as described in the title document.

I'm confused about a few points:

  1. How should one interpret both X&Y, where X and Y are "factor vectors in the latent space" for both users and items, respectively. More specifically, once I factor Q (mxn) into X, which is mxr, how does one interpret the columns of X? Similarly, Y is an rxn dimensional matrix, what does each row correspond to? Here "r" is the rank of the reduced subspace. I'm looking for some sort heurestic interpretation for both X&Y.

  2. How does one efficiently update this representation with new data? It seems that with each new piece of information, one must recompute both X&Y. Am I missing a trick?

Thanks

Best Answer

  1. These low-rank approximations are quite hard to interpret. Moreover, once you've got your $X$ and $Y$ such that $X Y^T \approx Q$ you can apply an unitary transformation $U$ to them to obtain $X_* = X U$ and $Y_* = Y U$ which leads to $X_* Y_*^T = X U U^T Y^T = X Y^T$.

    This means that there are many possible solutions (each $X U$ for different $U$ is as good as the other one). This also harms interpretability in my opinion: suppose there is some "true" or "interpretable" factorization that learns low-rank representations of items and users where components of of user/item's vector mean something like "how much of comedy there is in this film" or "how much this user loves comedies", these components are distinct and uncorrelated. With the above result what you're most likely to get is a vector whose components is a mix of those desirable and interpretable components. Unfortunately, you can't "unmix" them because you don't know mixing matrix $U$.

    If you seek interpretability, you can try topic modeling techniques (collaborative filtering is mostly topic modeling where documents are users and words are items being recommended), in particular LDA. I'm not sure though if it'll work good with implicit feedback.

  2. Efficient online update is troubling, indeed. In general, you might want to re-train your model from scratch on all current data each night or once a week. In the meanwhile you can do something like a online update each time you observe $q_{ui}$:

    • One possibility is to fix everything but $u$-th row and $i$-th column of $Q$ and solve optimization problem for $X_u$ and $Y_i$
    • Another option is to make a little stochastic gradient descent step towards the optimum. When you observe $q_{ui}$ you essentially add another term into your loss: $l_{ui} = c_{ui} (q_{ui} - x_u^T y_i)^2$. With stochastic gradient descent you update $x_u$ and $y_i$ as follows:

    $$ x_u \leftarrow x_u - \alpha \frac{\partial l_{ui}}{\partial x_u} $$ $$ y_i \leftarrow y_i - \alpha \frac{\partial l_{ui}}{\partial y_i} $$

    Both approaches are hacky and are not guaranteed to work. It'd better to resort to some model that was designed with online updates in mind.