Solved – Interpreting results of lightFM (factorization machines for collaborative filtering)

recommender-system

I built a recommendation model on a user-item transactional dataset where each transaction is represented by 1.

model = LightFM(learning_rate=0.05, loss='warp')

Here are the results

Train precision at k=3:  0.115301
Test precision at k=3:  0.0209936

Train auc score:  0.978294
Test auc score : 0.810757

Train recall at k=3:  0.238312330233
Test recall at k=3:  0.0621618086561

Can anyone help me interpret this result? How is it that I am getting such good auc score and such bad precision/recall? The precision/recall gets even worse for 'bpr' Bayesian personalized ranking.

Prediction task

users = [0]
items = np.array([13433, 13434, 13435, 13436, 13437, 13438, 13439, 13440])
model.predict(users, item)

Result

array([-1.45337546, -1.39952552, -1.44265926, -0.83335167, -0.52803332,
   -1.06252205, -1.45194077, -0.68543684])

How do I interpret the prediction scores?

Thanks

Best Answer

You may find my answer on StackOverflow helpful: https://stackoverflow.com/questions/45451161/evaluating-the-lightfm-recommendation-model

The prediction scores themselves are not interpretable: they are simply a means of creating a recommended item ordering for the user. If you sort your items by their score in descending order, the items at the beginning of the list are more likely to be of interest to your user.

Related Question