I'm cross validating a model that's trying to predict a count. If this was a binary classification problem, I'd calculate out-of-fold AUC, and if this was a regression problem I'd calculate out-of-fold RMSE or MAE.
For a Poisson model, what error metrics can I use to evaluate the "accuracy" of the out-of-sample predictions? Is there a Poisson extension of AUC that looks at how well the predictions order the actual values?
It seems that a lot of Kaggle competitions for counts (e.g. number of useful votes a yelp review will get, or number of days a patient will spend in the hospital) use root mean log squared error, or RMLSE.
/Edit: One thing I've been doing is calculating deciles of the predicted values, and then looking at the actual counts, binned by decile. If decile 1 is low, decile 10 is high, and the deciles in between are strictly increasing, I've been calling the model "good," but I've been having trouble quantifying this process, and I've convinced there's a better approach.
/Edit 2: I'm looking for a formula that takes predicted and actual values and returns some "error" or "accuracy" metric. My plan is to calculate this function on the out-of-fold data during cross-validation, and then use it to compare a wide variety of models (e.g. a poisson regression, a random forest and a GBM).
For example, one such function is RMSE = sqrt(mean((predicted-actual)^2))
. Another such function would be AUC. Neither function seems to be right for poisson data.
Best Answer
There are a couple of proper and strictly proper scoring rules for count data you can use. Scoring rules are penalties $s(y,P)$ introduced with $P$ being the predictive distribution and $y$ the observed value. They have a number of desirable properties, first and foremost that a forecast that is closer to the true probability will always receive less penalty and there is a (unique) best forecast and that one is when the predicted probability coincides with the true probability. Thus minimizing the expectation of $s(y,P)$ means reporting the true probabilities. See also Wikipedia.
Often one takes an average of those over all predicted values as
$S=\frac{1}{n}\sum_{i=1}^n s(y^{(i)},P^{(i)})$
Which rule to take depends on your objective but I'll give a rough characterization when each is good to be used.
In what follows I use $f(y)$ for the predictive probability mass function $\Pr(Y=y)$ and $F(y)$ the predictive cumulative distribution function. A $\sum_k$ runs over the whole support of the count distribution (i.e, $0,1,\dots, \infty$). $I$ denotes an indicator function. $\mu$ and $\sigma$ are the mean and standard deviation of the predictive distribution (which are usually directly estimated quantities in count data models).
Strictly proper scoring rules
Other scoring rules (not so proper but often used)
Example R code for the strictly proper rules: