Solved – Model Evaluation for Discrete Regression

model-evaluationr-squaredregression

I've building a model to predict count variables, i. e. the quantity I'm predicting is a positive integer.

I know that for regression a usual metric of model quality is the R-squared coefficient, but I'm not sure if this is a good metric for a discrete output. What's the usual metric for model evaluation for a discrete regression?

Best Answer

If you really wanted, then you could use one of multiple proposals for pseudo-$R^2$ for generalized linear models, since Poisson regression is a kind of generalized linear model. However, in general, even if $R^2$ is popular, it is not the best measure and can be misleading.

Instead, what you could do is:

  • If you are comparing models, you could use multiple information criteria like AIC, or BIC, or likelihood-ratio tests.
  • You could use cross-validation and if you are going to use your model for prediction, then you should consider it. By cross-validation we mean splitting the data into two parts, where one part is used for "training" your model, and the second part is used to make predictions. By this we test our model on the data that was "not seen" by it previously, so we can check how it could possibly behave with external data.
  • In many cases very simple and very revealing thing to do is to plot distribution of your predicted variable and distribution of your predictions on two overlapping histograms or density plots. This may easily make you aware of what exactly does your model predict.
  • Another thing to consider are posterior predictive checks (check also here). The idea is to simulate some random data using your model and then compare the distribution of simulated data, to the real data to check when and how they are similar to each other.
  • Besides, I'd highly recommend to look at diagnostic plots (see also here, here, and here) to make sure if there are no problems with your model.

Check also How to calculate goodness of fit in glm (R) and If the model fits well, nothing can be done?

For reading more, I'd highly recommend Data Analysis Using Regression and Multilevel/Hierarchical Models by Andrew Gelman and Jennifer Hill, or Regression Modeling Strategies by Frank E. Harrell.