Solved – Why don’t the cluster-robust (panel-robust) standard errors match those in Stata? (solved)

I'm trying to write my own code for cluster-robust (AKA panel-robust, AKA heteroskedasticity and serial-correlation-consistent) standard errors, so that I can make a couple of small extensions. But I can't get my results to match Stata's (in which this procedure is routine), so I am probably missing some detail. Would appreciate pointers on the code, below.

The model is
$$
y_{ig} = \alpha_g + X'_{ig}\beta + \epsilon_{ig}
$$
The groupwise mean is the subtracted from every term above, getting rid of the $\alpha$, and leaving de-meaned group-varying variables:
$$
ydm_{ig} = Xdm'_{ig}\beta + \epsilon dm_{ig}
$$

The variance of $\beta$ is estimated as

$$
\left(\displaystyle\sum_G Xdm_g'Xdm_g\right)^{-1} \displaystyle\sum_G Xdm_g' \epsilon dm_g \epsilon dm_g' Xdm_g \left(\displaystyle\sum_G Xdm_g'Xdm_g\right)^{-1}
$$

Standard errors are the square root of the diagonal of this matrix, inflated by $G/(G-1) \times (N-1)/(N-K)$.

This is all textbook econometrics. Standard in stata, and code exists to do it in R. (this, for example)

I want my code to not rely on plm, lmtest or sandwich.

The following script is supposed to implement the math above, simulating a panel dataset in which outcomes are autocorrelated and groups have different forms of heteroskedasticity. It gives the right point estimates but the standard errors are bigger than those given by stata:

rm(list=ls()) 
set.seed(999)
N = 1000
G = 200
x = rnorm(N)
z = rexp(N)
obs = N/G
fe = data.frame(ID=1:G,fe = rnorm(G)+5)
fe = data.frame(ID = rep(fe$ID,obs),fe = rep(fe$fe,obs))
t = rep(1,G); for (i in 2:obs) {t = c(t,rep(i,G))}
data = data.frame(y=x+z+fe$fe+x*rnorm(N,mean=0,sd=fe$fe*runif(N)), ID=fe$ID,x,z,t)
write.csv(data,"testdata.csv")

demean = function(var,ID){
    dat = data.frame(var,ID)
    library(doBy)
    means = summaryBy(var~ID,data=dat,fun=mean)
    d = data.frame(var,ID)
    a = merge(d,means,by="ID")
    adm = a[,2]-a[,3]
    adm
    }
xdm = demean(data$x,data$ID)
ydm = demean(data$y,data$ID)
zdm = demean(data$z,data$ID)

mdm = lm(ydm~xdm+zdm-1)
summary(mdm)

e = mdm$resid

lpm = cbind(xdm,zdm)

bread = matrix(0,ncol(lpm),ncol(lpm))
tofu = matrix(0,ncol(lpm),ncol(lpm))
K = mdm$rank
    for (i in 1:G){
    	X = lpm[data$ID==unique(data$ID)[i],]
    bread = t(X)%*%X +bread

    r = e[data$ID==unique(data$ID)[i]]
    tofu = t(X) %*% r %*% t(r) %*% X + tofu
    }
bread = solve(bread)
vcv = bread%*%tofu%*%bread
se = G/(G-1)*(N-1)/(N-K)*sqrt(diag(vcv))
summary(mdm)
se

Using that testdata.csv file, I can compare to Stata:

clear all
insheet using "testdata.csv"
xtset id t
xtreg y x z,fe cluster(id)

My R-code:

1> se
       xdm        zdm 
0.16946120 0.08793485

Stata's output:

             |               Robust
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   1.223739   .1992449     6.14   0.000     .8308371    1.616642
           z |    .876592   .0960943     9.12   0.000     .6870981    1.066086

Mine are consistently more optimistic than Stata's. Anybody help me figure out why? I feel like it is probably some small bug, but I can't pinpoint where.

EDIT: The answer to the question is here and here. Stata doesn't run a textbook “within'' estimator, rather it adds the averages back into each variable, which is what allows for the estimation of a constant term (which has the interpretation of the mean of the fixed effects, and which allows for prediction. When I alter my code to copy those procedures, my SE's are equivalent up to the 3rd or 4th decimal point.

Best Answer

STATA computes a slight variation of the within estimator. However, it is possible to compute clustered standard errors in R that are identical to those in STATA. The following post gives you a function and the code that replicated STATA's clustered standard errors in R:

Clustered standard errors and robust standard errors

Best Answer

Related Solutions

Fixed Effects Model – Comparing Python’s Linearmodels PanelOLS with Stata’s xtreg, fe

Related Question