Solved – What statistical methods are there to recommend a movie like on Netflix

dynamic-regressionmachine learningrecommender-system

I am looking to implement a dynamic model to recommend a movie to a user. The recommendation should be updated every time the user watches a movie or rates it. To keep it simple I am thinking of taking two factors into account:

  • the past ratings of other movies by the user
  • the time that the user watched certain past movies

How would one setup such a model, and what does the academic literature recommend?

I am new in this field and am guessing that a linear regresion model could provide a good outcome, not to fancy around with complexer methods to avoid imposing unnecessary uncertainty in the parameter estimations. But perhaps there are already established methods that are commonly used in practice?

Best Answer

This is actually a relatively famous problem in the field of machine learning. In ~2006 Netflix offered $1m to the algorithm that provided the best reasonable improvement to their recommender system. The theory of the winning solution is briefly discussed in this Caltech textbook on introductory machine learning.

Basically an ensemble learning method was used. In particular, a type of blending or stacking was employed. This is nontrivial, but kind of intuitive. To understand the intuition of using different statistical approaches in harmony, consider the different reasons different people like the same movies: i.e., Joe may like Topgun because he loves 80s action movies, while Jane likes Topgun because she likes movies with Kenny Loggins soundtracks. So the fact that both viewers watched (and rated the movie highly) doesn't necessarily mean they will like other movies with high probability. The prediction algorithm would ideally be able to accommodate these differences, at least in some capacity.

This may make the solution sound pretty simple, but balancing competing algorithms and prioritizing the best guess for each case is definitely not simple. The fact that Netflix offered such a large bounty should make the magnitude of the challenge rather obvious.

If you are just starting in machine learning, checking out the above resources may be helpful depending on your interest level and your maths background. So regression would probably work okay-to-good, but significantly better performance is possible.

Related Question