Solved – R² for a negative binomial regression model

negative-binomial-distributionrr-squaredregression

I have been searching quite a while to find a useful way for calculating (an estimate for) the explained variance for a negative binomial regression model in R… Knowing that the "explained variance" is a concept from OLS, that the model is fitted via maximum log-likelihood, and that I could as well tell the model fit otherwise.

The point is that the concept of explained variance is very good to understand from the reader's point. If you have 50% of the variance explained, than something else explains the other 50%. May that be measurement error or variables not included in the model. Other than a log-likelihood the R² also gives an impression of effect sizes.

The closest answer I found was:

Cameron & Windmeijer. (1996). R-Squared Measures for Count Data Regression Models with Applications to Health-Care Utilization.

Unfortunately, they argue that their most elaborate suggestion (formula 2.17) "has an interpretation in terms of information content of the data" (p. 209, 215) for poisson regression models, but not for the negative binomial regression. Even worse, I lack the statistical skills to actually calculate that in R…

Therefore, my question is: Does anyone know a reasonable way to calculate a pseudo R² that reflects the "information content" of the independent variables in a negative binomial regression model?

If there are good reasons for why my search is futile, I would also appreciatean explanation 🙂 Thanks!

Why I need a negbin model? Because my outcome variable is observational data from the Internet. And as so often, most variables from the Internet have a "long tail" distribution, or more accurately, overdispersed count data 🙂

Best Answer

I don't understand what is intended with the phrase "information content".

That being said, you might investigate any one of several pseudo r-square measures.

Efron's pseudo r-square relies on the difference between the y values predicted by the model and the observed y values. So, it's pretty easy to explain.

Some other pseudo r-square values compare the likelihood of the model to the likelihood of a null model, which reflects the improvement of the model over the null model.

I can't speak authoritatively, but from playing around with some toy data, it seems to me that Cox and Snell, Nagelkerke, and Efron pseudo r-square measures work well with negative binomial regression.

A good source is from UCLA's Institute for Digital Research and Education.

Related Question