Hay, im a newbie and still need more learn. I have several question, I'm trying to create a negative binomial regression model using the R and library(MASS). But i'm still confusing what sould I use glm (Y~X1+X2+X3+X4+X5+X6+X7, family=negative.binomial(theta), data, maxit) or glm.nb.
What is the difference between those two functions?
[some one said if we don’t know overdispersion parameter we can't use glm(), so glm.nb() is the option; and other person said, in glm.nb() theta is assumed theta=1].
Anotherhand, i am still confused about theta, in some discussion forum it said theta is an overdispersion parameter, but others said theta is a shape parameter for distribution and overdispersion is the same as k, as discussed in The R Book (Crawley 2007).
I have read a tutorial to negative binomial regression with R (but still for me not look that corect). In that tutorial, suggest to trial and error what theta value until Residual deviance equal to degrees of freedom with glm (Y~X1+X2+X3+X4+X5+X6+X7, family=negative.binomial(theta), data, maxit)
sorry for my bad english
Best Answer
The negative binomial model is a generalized linear model only when the overdispersion parameter theta is known. In applications, we don't know it, and it needs to be estimated along with the other parameters in the model.
glm(., family = negative.binomial(theta))
requires you to have a valuetheta
that you can supply.glm.nb()
fits the traditional negative binomial model where theta is estimated. The latter is the on you want; never use the former.Theta is not assumed to be 1 in
glm.nb()
, but it is initialized with an arbitrary value because the way model fitting works withglm.nb()
is that an initial guess of the parameter estimates is updated until convergence. The initial value can be supplied toinit.theta
, but there is no reason to do this in most cases.