Solved – the reason for differences between nbreg and glm with family(nb) in Stata

negative-binomial-distributionstata

I am a novice Stata user (forced here from R to conform to coauthor's quirky option choices). I need access to the Pearson residuals from a negative binomial regression. I currently have the regression equations specified in this manner:

xi: nbreg cntpd09 logpop08 pcbnkthft07 pccrunion07 urbanpop pov00 pov002 edu4yr ///
 black04 hispanic04 respop i.pdpolicy i.maxloan rollover i.region if isser4 != 1,
 cluster(state)
est store pd

As I understand it, to get the Pearson residuals I need to undertake the same regression using the glm function. Something like this:

xi: glm cntpd09 logpop08 pcbnkthft07 pccrunion07 urbanpop pov00 pov002 edu4yr ///
 black04 hispanic04 respop i.pdpolicy i.maxloan rollover i.region if isser4 !=1, ///
 family(nb) vce(cluster state)

Some details to note: 1)I use an if statement to select a subset of the total records, 2)I am using robust standard errors clustered by "state"

Sorry for not including a replicable example at this stage, but I am new enough to Stata that I am not sure what each part of these statements is doing exactly and wanted the formatting to be exact.

Is there something basic in the coding that would explain differences in coefficients and standard errors between these methods as I have them written out? Should I be worried about these differences? How can I make the glm version conform to the nbreg version?

A secondary question since I have your attention already, how do I save, aggregate, and export the Pearson residuals assuming I get the glm function to work properly and am able to run it on each of the 6 regression models in our paper (Each model has the same observations, just different dependent and independent variables)? Ideally I would have a table with six columns of residuals in any format readable by R (including dta or other Stata defaults).

Thanks in advance,

Chris

Best Answer

You can definitely use glm to fit this model. In glm, you can specify family(nbinomial $\#_{k}$) and then search for a $\#_{k}$ that makes the deviance-based dispersion equal to 1. However, you can also use family(nbinomial ml) to estimate $\#_{k}$ with maximum likelihood, which should report the same value as nbreg. On the other hand, nbreg will also give you a confidence interval. The nb link function is $\eta=\ln \frac{\mu}{\mu +k}$, where $k=1$ if you specify family(binomial) without the $\#_{k}$ parameter.

To get your residuals, run your glm command first. Then type predict resid1, pearson. Do that for the other 5 specifications to get resid2-resid6. I am not sure what you mean by aggregate, but you can export the residuals (and an id) as a csv file with outsheet idvar resid1-resid6 using "C:/pearson_resids", comma.

Related Question