I built a recommendation model on a user-item transactional dataset where each transaction is represented by 1.
model = LightFM(learning_rate=0.05, loss='warp')
Here are the results
Train precision at k=3: 0.115301
Test precision at k=3: 0.0209936
Train auc score: 0.978294
Test auc score : 0.810757
Train recall at k=3: 0.238312330233
Test recall at k=3: 0.0621618086561
Can anyone help me interpret this result? How is it that I am getting such good auc score and such bad precision/recall? The precision/recall gets even worse for 'bpr' Bayesian personalized ranking.
Prediction task
users = [0]
items = np.array([13433, 13434, 13435, 13436, 13437, 13438, 13439, 13440])
model.predict(users, item)
Result
array([-1.45337546, -1.39952552, -1.44265926, -0.83335167, -0.52803332,
-1.06252205, -1.45194077, -0.68543684])
How do I interpret the prediction scores?
Thanks
Best Answer
You may find my answer on StackOverflow helpful: https://stackoverflow.com/questions/45451161/evaluating-the-lightfm-recommendation-model
The prediction scores themselves are not interpretable: they are simply a means of creating a recommended item ordering for the user. If you sort your items by their score in descending order, the items at the beginning of the list are more likely to be of interest to your user.