Solved – R equivalent to cluster option when using negative binomial regression

negative-binomial-distributionrstata

I am trying to replicate a colleague's work and am moving the analysis from Stata to R. The models she employs invoke the "cluster" option within the nbreg function to cluster the standard errors.

See http://repec.org/usug2007/crse.pdf for a fairly complete description of the what and why of this option

My question is how to invoke this same option for negative binomial regression within R?

The primary model in our paper is specified in Stata as follows

 xi: nbreg cntpd09 logpop08 pcbnkthft07 pccrunion07 urbanpop pov00 pov002 edu4yr ///
 black04 hispanic04 respop i.pdpolicy i.maxloan rollover i.region if isser4 != 1,   
 cluster(state)

and I have replaced this with

pday<-glm.nb(cntpd09~logpop08+pcbnkthft07+pccrunion07+urbanpop+pov00+pov002+edu4yr+
black04+hispanic04+respop+as.factor(pdpolicy)+as.factor(maxloan)+rollover+
as.factor(region),data=data[which(data$isser4 != 1),])

which obviously lacks the clustered errors piece.

Is it possible to do an exact replication? If so how? If not, what are some reasonable alternatives?

Thanks

[Edit]
As noted in the comments, I was hoping for a solution that didn't take me into the realm of multilevel models. While my training allows me to see that these things should be related, it is more of a leap than I am comfortable taking on my own. As such I kept digging and found this link:
http://landroni.wordpress.com/2012/06/02/fama-macbeth-and-cluster-robust-by-firm-and-time-standard-errors-in-r/

that points to some fairly straightforward code to do what I want:

library(lmtest)
pday<-glm.nb(cntpd09~logpop08+pcbnkthft07+pccrunion07+urbanpop+pov00+pov002+edu4yr+
 black04+hispanic04+respop+as.factor(pdpolicy)+as.factor(maxloan)+rollover+
 as.factor(region),data=data[which(data$isser4 != 1),])
summary(pday)

coeftest(pday, vcov=function(x) vcovHC(x, cluster="state", type="HC1"))

This doesn't replicate the results from the analysis in Stata though, probably because it is designed to work on OLS not negative binomial. So the search goes on. Any pointers on where I am going wrong would be much appreciated

Best Answer

This document shows how to get cluster SEs for a glm regression:

http://dynaman.net/R/clrob.pdf

Related Solutions

Standard Error – Standard Error Clustering in R (Either Manually or in plm)

Edit as of December 2021:

Probably the easiest way to get clustered standard errors in R now is via the felm function in the lfe package or the feols function in the fixest package:

feols in fixest: Clustering syntax and standard error computational procedure
felm in lfe: CRAN documentation

Original answers and some subsequent edits:

For White standard errors clustered by group with the plm framework try

coeftest(model.plm, vcov=vcovHC(model.plm,type="HC0",cluster="group"))

where model.plm is a plm model.

See also this link

http://www.inside-r.org/packages/cran/plm/docs/vcovHC or the plm package documentation

EDIT:

For two-way clustering (e.g. group and time) see the following link:

http://people.su.se/~ma/clustering.pdf

Here is another helpful guide for the plm package specifically that explains different options for clustered standard errors:

http://www.princeton.edu/~otorres/Panel101R.pdf

Clustering and other information, especially for Stata, can be found here:

http://www.kellogg.northwestern.edu/faculty/petersen/htm/papers/se/se_programming.htm

EDIT 2:

Here are examples that compare R and stata: http://www.richard-bluhm.com/clustered-ses-in-r-and-stata-2/

Also, the multiwayvcov may be helpful. This post provides a helpful overview: http://rforpublichealth.blogspot.dk/2014/10/easy-clustered-standard-errors-in-r.html

From the documentation:

library(multiwayvcov)
library(lmtest)
data(petersen)
m1 <- lm(y ~ x, data = petersen)

# Cluster by firm
vcov_firm <- cluster.vcov(m1, petersen$firmid)
coeftest(m1, vcov_firm)
# Cluster by year
vcov_year <- cluster.vcov(m1, petersen$year)
coeftest(m1, vcov_year)
# Cluster by year using a formula
vcov_year_formula <- cluster.vcov(m1, ~ year)
coeftest(m1, vcov_year_formula)

# Double cluster by firm and year
vcov_both <- cluster.vcov(m1, cbind(petersen$firmid, petersen$year))
coeftest(m1, vcov_both)
# Double cluster by firm and year using a formula
vcov_both_formula <- cluster.vcov(m1, ~ firmid + year)
coeftest(m1, vcov_both_formula)

Solved – the reason for differences between nbreg and glm with family(nb) in Stata

You can definitely use glm to fit this model. In glm, you can specify family(nbinomial $\#_{k}$) and then search for a $\#_{k}$ that makes the deviance-based dispersion equal to 1. However, you can also use family(nbinomial ml) to estimate $\#_{k}$ with maximum likelihood, which should report the same value as nbreg. On the other hand, nbreg will also give you a confidence interval. The nb link function is $\eta=\ln \frac{\mu}{\mu +k}$, where $k=1$ if you specify family(binomial) without the $\#_{k}$ parameter.

To get your residuals, run your glm command first. Then type predict resid1, pearson. Do that for the other 5 specifications to get resid2-resid6. I am not sure what you mean by aggregate, but you can export the residuals (and an id) as a csv file with outsheet idvar resid1-resid6 using "C:/pearson_resids", comma.

Best Answer

Related Solutions

Standard Error – Standard Error Clustering in R (Either Manually or in plm)

Solved – the reason for differences between nbreg and glm with family(nb) in Stata

Related Question