Solved – Theta estimation in negative binomial regression

count-datanegative-binomial-distributionregression

I would like to estimate the conditional mean of counts assuming a negative binomial distribution.

I can estimate the unconditional mean using MASS's glm.nb function:

fit <- MASS::glm.nb(Days ~ 1, data = quine)
mu <- exp(coef(fit)[1])
size <- fit$theta

And I can verify the parameter estimates, again using the MASS package:

MASS::fitdistr(quine$Days, 'negative binomial')

For the unconditional probability of a specific count, I can use mu and size in dnbinom

dnbinom(5, size=size, mu=mu)  # ~ .04

But when I model the conditional mean using something like:

fit <- glm.nb(Days ~ Sex + Age, data = quine)
mu <- exp(coef(fit)[1])
size <- fit$theta

I'm not sure how to get the conditional parameter estimates. I know that exp(predict(fit)) will give me the conditional mean (mu) for each i, but is there a way to get the conditional theta from the model? And what does the theta in the fit mean? It is not equal to the unconditional fit's theta from the earlier unconditional estimate.

Best Answer

The glm.nb() function in MASS uses the following parameterization: $$ \begin{eqnarray*} y_i & \sim & \mathrm{NB}(\mu_i, \theta) \quad (i = 1, \dots, n)\\ \log(\mu_i) & = & x_i^\top \beta \end{eqnarray*} $$ Thus, only the expectation $\mu_i$ depends on the regressors. The size parameter $\theta$ is constant for all $n$ observations.

To evaluate the predicted distribution you can still use dnbinom(5, size = size, mu = mu) where size is just a scalar value and mu a vector.