Solved – How to evaluate goodness of fit for negative binomial regression

goodness of fitnegative-binomial-distributionregression

I'm trying to fit a model estimating waiting time using negative binomial regression, but I'm not sure how to assess the goodness of fit for my model. I would like to compare the negative binomial model to a Poisson model. I have approximately $4,000$ data points. Any suggestions?

Thanks!

Best Answer

Generally speaking, a good fitting model means does a good job generalizing to data not captured in your sample. A good way to mimic this is through cross-validation (CV). To do this, you subset your data into two parts: a testing data set and a training data set. Based on your sample size, I would recommend randomly putting 70% of your data into a testing data set and the remaining 30% in a training data set.

Now, build both the Poisson model and the negative binomial model based on your training data set. Calculate the predicted values for the data in your testing data set and compare it to the actual values in the following way:

$\sum_{i=1}^{n_2} (Y_i - \hat{Y}_i)^2$

where $n_2$ is the sample size of your training data set, $Y_i$ is the actual value of the dependent variable, and $\hat{Y}_i$ is the predicted value of the dependent variable.

Whichever model provides a lower value for the above expression is the preferred model.

Now, there is a modification of this called k-folds CV. What it will do is split your data into $k$ approximately equal subsets (called "fold") and will predict each fold using the remaining folds as training data. Setting $k=4$ seems reasonable to me.

The relevant R function for this is cv.glm() in the boot package. More information here: http://stat.ethz.ch/R-manual/R-patched/library/boot/html/cv.glm.html

Related Question