Solved – Practical collaborative filtering application for large database

large datamachine learningrecommender-system

I’m designing an item-based collaborative filtering for a large database with over 100,000 items.

My question is how the whole process works in practice since the algorithm takes a long time to evaluate the entire utility matrix and find the nearest neighbors. On the other hand, users are constantly evaluating (and reevaluating) items and demand a real time recommendation.

The strategy I’m adopting is to run the algorithm offline with a certain periodicity and meanwhile use a fix set of NN for each item. The problem with this approach is that the recommendations will be always based on out of date relations between the items which could result in imprecise recommendations, especially if user evaluations change very dynamically.

Is this a good strategy? How is this problem normally addressed?

Best Answer

A simple approach would be:

Let's say that you 5 items you can train on and 3 that you are recommending:

1) create a similarity matrix between active items and training items, so you similarity matrix is 5 x 3. Similarity can be based on just item attributes, or/and other user's activities on these items.

2) Each time a user comes to the site you grab their evaluations, say they range from 1 to 10, and look like this: [10,4,10,3,1].

3) Decompose, evaluations into binary matrices, for example:

eval_10 => [1,0,1,0,0] 

4) item_scores = eval_10 * similarity

item scores is now 1 x 3, where the scores mean: items similar to items that a user ranked 10, you can do this for 9 and give a smaller weight, and for 1 and give an negative weight. And sum up the scores.

If you are able to process user actions quickly (I know it is can be tricky), then your recommendations are pretty good proxy for real time. The key is you don't have to recalculate the similarity matrix over and over. Perhaps just once 24 hours is enough.

This is a very simple approach, but I've used that pretty successfully. Hope this helps.

Related Question