Solved – Recommendation systems as learning to rank problem

machine learningrankingrecommender-systemsupervised learning

Currently, I am interested in building a recommendation system. I want to build it as a learning to rank problem using either xgboost/lightgbm.

I am reading two papers about the process:

  1. https://pdfs.semanticscholar.org/8f4f/d9ee2c55648a48ad571c02d821799904faa7.pdf
  2. http://delivery.acm.org/10.1145/3110000/3109897/p251-freno.pdf?ip=216.240.51.5&id=3109897&acc=OPEN&key=4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E6D218144511F3437&CFID=808096793&CFTOKEN=20280758&acm=1505497322_2a5ed6c9c88b7a08b8d16377247cc379.

For defining the labels I plan to use an implicit score similar to the approach in the first linked paper. For each user I have access to information about whether the user purchased, liked the item, clicked on the item and also if they removed the item from the cart.

My questions are:

  1. Through all of the research I have done so far the goal of the recommender system is to provide users a list of NEW content that they might be interested in buying. When training with the data I mentioned above the user has implicitly expressed interest in each item. Using the items from this list that were not purchased are not necessarily new content. I am wondering if any item the user interacted with that was not purchased or liked but interacted with should be considered as a candidate for recommendation? Has anyone actually recommended a product that a user has purchased before?

  2. If indeed I am to consider only new content in the recommendation phase then am I supposed to predict a relevance score on ALL (user, item) pairs that have not yet had an interaction? What if I have over 10k products? This seems like it will not scale very well.

Any comments, insights or feedback would be greatly appreciated!

Best Answer

Although this is an old question, this might be helpful for people just starting out. Specifically the recommenderbase and nearestneighbour classes. These provide a method for calculating similarity between two items by using BM25 weighing system. BM25 is generally used in ranking webpages but the aforementioned source code modifies it for item recommendations. As the code is in Cython, it's parallelized and quite fast. The library also contains ALS(Alternating Least Squares) for recommending which I recommend you check out.

Related Question