Solved – How to generalize precision, recall and F-score to non-classification problem

errorprecision-recall

I am working on a problem to predict a value between 0 and 1. The data is skewed, so that only 18% of the values are greater than 0.

I implemented a machine learning system, and what is happening is that the lowest error rate on the validation set (18%) is found when all predictions are 0.

So, I wanted to implement precision, recall and F-Scores, to find ways of more accurately measuring my results (instead of error rate and MSE). But precision, recall and F-Scores seem to apply to classification problems.

What is a better approach to measuring the results in this case? Is there a good way to use precision, recall and F-Scores?

EDIT: To clarify, I think it is a regression problem and should output a continuous value between 0 and 1. It is a recommendation engine using Collaborative Filtering. I am creating a matrix between users and items, and minimizing the MSE.

Best Answer

Ok, so your question is: what evaluation metrics should one use for Collaborative Filtering. This is a continuous-valued response (unless you've dichotomized it, which some people do recommend), and you want to use some kind of a residual-based loss function such as MSE. The Netflix Prize competition used Root MSE.

There's lots of materials on evaluation of CF. You might also watch Andrew Ng's lectures on CF, see under XVI. Recommender Systems.

Also, make sure that your MSE computation is working right: make a fake recommender that peeks at the validation set to produce correct responses at least 10%, 30%, 50%, 70% of the time and see if the MSE drops.