Ok, I have searched and searched and just have no clue where to start. First, what I would like to do is produce a QQ-plot (or even a readable residual plot) to look at the fit of my model. I guess I just don't understand how the parameters that go into qnbinom()
are obtained from the output of MASS::glm.nb()
. I am attempting to use probplot()
from package e1071, but am unsure of the inputs needed. It would be great if someone having experience with fitting negative binomials could lend a hand.
Secondly, I came across a residual plot here: http://www.stat.cmu.edu/~hseltman/Rclass/R8.R
I can make it to work, but I don't know how to interpret it or if I am using it correctly. Has anyone else used this?
At the moment I am relying on the AIC and plots of fitted vs. actual values to assess the fit of my model and I would like something a little better!
Edit:
Hopefully this will clarify what I am asking. With qnbinom(p, size, prob, mu, lower.tail = TRUE, log.p = FALSE)
, how do I (or is it even possible to) get the p
, size
, prob
, mu
from the output of a glm.nb
fit model? From my research, I have found that size is the dispersion parameter, but other than that I'm not sure where to go. I know theta goes in there somehow, just not sure how to get it in the form needed.
Edit 2: Ok, once I have a distplot()
, is there a guide to interpreting it? I am fairly positive I have a bad fit because I have a curved plot with a red line going through it (with many points on the tails far from the red line). The prob: ML = 0.011, is this rejecting that the distribution is from the negative binomial specified?
Best Answer
You might find
distplot()
from the vcd package useful either for the original data (edit: you can't use it on residuals). This plots Friendly's "negativebinomialness plots" and provides how well the negative binomial model fitsdistplot(response, type = "nbinomial", ...)
To obtain the parameters:
glm.nb
uses the "Gamma mixture of Poisson" representation. It is actually a log-linear model that is fitted, so you should get the mean as $\exp(X\beta)$.For example, let's say your data come from a negbin with mean 5 and theta of 1 (in the alternative representation as described above). Then you can get the mean estimate simply by
which are in this case 5.1 for the mean (pretty close) and 1.6 for the dispersion parameter (pretty far off).
If you fit a model for the conditional mode, you interpret it accordingly as in every other log linear model, see this discussion on stack exchange.
EDIT: If you want to know how to get the mean in a negbin regression model you need to sum up the linear predictor $X\beta$.
For example: I take the
quine
data and fitnow males are 1 females are 0. To get the mean for males you write
and for females
Now to get the mean you must weight this with the occurence of all females and males which is
and hence the mean is
This is confirmed by
(apart from rounding errors).