Solved – R equivalent to cluster option when using negative binomial regression

negative-binomial-distributionrstata

I am trying to replicate a colleague's work and am moving the analysis from Stata to R. The models she employs invoke the "cluster" option within the nbreg function to cluster the standard errors.

See http://repec.org/usug2007/crse.pdf for a fairly complete description of the what and why of this option

My question is how to invoke this same option for negative binomial regression within R?

The primary model in our paper is specified in Stata as follows

 xi: nbreg cntpd09 logpop08 pcbnkthft07 pccrunion07 urbanpop pov00 pov002 edu4yr ///
 black04 hispanic04 respop i.pdpolicy i.maxloan rollover i.region if isser4 != 1,   
 cluster(state)

and I have replaced this with

pday<-glm.nb(cntpd09~logpop08+pcbnkthft07+pccrunion07+urbanpop+pov00+pov002+edu4yr+
black04+hispanic04+respop+as.factor(pdpolicy)+as.factor(maxloan)+rollover+
as.factor(region),data=data[which(data$isser4 != 1),])

which obviously lacks the clustered errors piece.

Is it possible to do an exact replication? If so how? If not, what are some reasonable alternatives?

Thanks

[Edit]
As noted in the comments, I was hoping for a solution that didn't take me into the realm of multilevel models. While my training allows me to see that these things should be related, it is more of a leap than I am comfortable taking on my own. As such I kept digging and found this link:
http://landroni.wordpress.com/2012/06/02/fama-macbeth-and-cluster-robust-by-firm-and-time-standard-errors-in-r/

that points to some fairly straightforward code to do what I want:

library(lmtest)
pday<-glm.nb(cntpd09~logpop08+pcbnkthft07+pccrunion07+urbanpop+pov00+pov002+edu4yr+
 black04+hispanic04+respop+as.factor(pdpolicy)+as.factor(maxloan)+rollover+
 as.factor(region),data=data[which(data$isser4 != 1),])
summary(pday)

coeftest(pday, vcov=function(x) vcovHC(x, cluster="state", type="HC1"))

This doesn't replicate the results from the analysis in Stata though, probably because it is designed to work on OLS not negative binomial. So the search goes on. Any pointers on where I am going wrong would be much appreciated

Best Answer

This document shows how to get cluster SEs for a glm regression:

http://dynaman.net/R/clrob.pdf