Solved – Designing a simple tag-based recommender system

data miningrecommender-system

I want to design a system for product recommendation, where I have the following:

  • ~1000 products that are tagged with 1 to ~10 descriptive tags by professional product specialists.
  • ~200000 users, that can "like" a product (no "dislike", no numerical rating), along with buying products (these are products where it makes sense to buy the same product multiple times).
  • Most users have no likes and no purchases, so the density of the user/likes and user/purchases matrices is very low.
  • I also have (for some users) explicit preferences of the form "I like this tag" or "I dislike this tag", and other information that can be used to weigh preferences after an initial ranking.
  • Since it makes sense to buy a product multiple times, I'd prefer a method that also scored products that have been previously bought/liked.

What are my options here? I'm feeling kind of lost, as most literature I can find on collaborative filtering seems to focus solely on numerical ratings, and most stuff I can find on content-based filtering seems to mostly talk about extracting the keywords which I already have in the form of tags.

Best Answer

I think there are 2 things to think about:

  1. How to deal with implicit feedback.

There is a famous paper called "Collaborative Filtering for Implicit Feedback Datasets" [1] which can help you. You can just use this type of method on your "user X purchases" table and see what happens.

  1. You have products but also tags of that products

You can transform your "user X products" matrix into a "user X tags" matrix. Then you can apply some function to add more weight to the "purchase" event. Then you mix (apply a weight and then just sum) with your original "user X tags" matrix. Imagine this scenario: users A and B, products X, Y and Z. User A purchased product X and product Z. Product X has tags "m" and "n", product B has tags "o" and "p" and product Z has tags "n" and "q". Your purchases user/products matrix is:

   X  Y  Z
A [1  0  1]
B [0  0  0]

This can be transformed into a purchase user/tags matrix:

   m  n  o  p  q
A [1  1  0  0  1]
B [0  0  0  0  0]

Instead of another binary matrix you can set values that represents a combination of possible interactions between a user and a product. All of that will be properly handled by a method prepared for implicit feedback.

I've already built a similar(in some sense) recommender system. Imagine travel packages and cities. The clients purchased travel packages but we ended up recommending cities and then applying some rules to find the best travel package to that city to that client. We could have recommended travel packages but this choice would gave us us a much more sparse and bigger matrix.

[1] http://yifanhu.net/PUB/cf.pdf