I'm trying to write my own code for cluster-robust (AKA panel-robust, AKA heteroskedasticity and serial-correlation-consistent) standard errors, so that I can make a couple of small extensions. But I can't get my results to match Stata's (in which this procedure is routine), so I am probably missing some detail. Would appreciate pointers on the code, below.
The model is
$$
y_{ig} = \alpha_g + X'_{ig}\beta + \epsilon_{ig}
$$
The groupwise mean is the subtracted from every term above, getting rid of the $\alpha$, and leaving de-meaned group-varying variables:
$$
ydm_{ig} = Xdm'_{ig}\beta + \epsilon dm_{ig}
$$
The variance of $\beta$ is estimated as
$$
\left(\displaystyle\sum_G Xdm_g'Xdm_g\right)^{-1} \displaystyle\sum_G Xdm_g' \epsilon dm_g \epsilon dm_g' Xdm_g \left(\displaystyle\sum_G Xdm_g'Xdm_g\right)^{-1}
$$
Standard errors are the square root of the diagonal of this matrix, inflated by $G/(G-1) \times (N-1)/(N-K)$.
This is all textbook econometrics. Standard in stata, and code exists to do it in R. (this, for example)
I want my code to not rely on plm
, lmtest
or sandwich
.
The following script is supposed to implement the math above, simulating a panel dataset in which outcomes are autocorrelated and groups have different forms of heteroskedasticity. It gives the right point estimates but the standard errors are bigger than those given by stata:
rm(list=ls())
set.seed(999)
N = 1000
G = 200
x = rnorm(N)
z = rexp(N)
obs = N/G
fe = data.frame(ID=1:G,fe = rnorm(G)+5)
fe = data.frame(ID = rep(fe$ID,obs),fe = rep(fe$fe,obs))
t = rep(1,G); for (i in 2:obs) {t = c(t,rep(i,G))}
data = data.frame(y=x+z+fe$fe+x*rnorm(N,mean=0,sd=fe$fe*runif(N)), ID=fe$ID,x,z,t)
write.csv(data,"testdata.csv")
demean = function(var,ID){
dat = data.frame(var,ID)
library(doBy)
means = summaryBy(var~ID,data=dat,fun=mean)
d = data.frame(var,ID)
a = merge(d,means,by="ID")
adm = a[,2]-a[,3]
adm
}
xdm = demean(data$x,data$ID)
ydm = demean(data$y,data$ID)
zdm = demean(data$z,data$ID)
mdm = lm(ydm~xdm+zdm-1)
summary(mdm)
e = mdm$resid
lpm = cbind(xdm,zdm)
bread = matrix(0,ncol(lpm),ncol(lpm))
tofu = matrix(0,ncol(lpm),ncol(lpm))
K = mdm$rank
for (i in 1:G){
X = lpm[data$ID==unique(data$ID)[i],]
bread = t(X)%*%X +bread
r = e[data$ID==unique(data$ID)[i]]
tofu = t(X) %*% r %*% t(r) %*% X + tofu
}
bread = solve(bread)
vcv = bread%*%tofu%*%bread
se = G/(G-1)*(N-1)/(N-K)*sqrt(diag(vcv))
summary(mdm)
se
Using that testdata.csv
file, I can compare to Stata:
clear all
insheet using "testdata.csv"
xtset id t
xtreg y x z,fe cluster(id)
My R-code:
1> se
xdm zdm
0.16946120 0.08793485
Stata's output:
| Robust
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | 1.223739 .1992449 6.14 0.000 .8308371 1.616642
z | .876592 .0960943 9.12 0.000 .6870981 1.066086
Mine are consistently more optimistic than Stata's. Anybody help me figure out why? I feel like it is probably some small bug, but I can't pinpoint where.
EDIT: The answer to the question is here and here. Stata doesn't run a textbook “within'' estimator, rather it adds the averages back into each variable, which is what allows for the estimation of a constant term (which has the interpretation of the mean of the fixed effects, and which allows for prediction. When I alter my code to copy those procedures, my SE's are equivalent up to the 3rd or 4th decimal point.
Best Answer
STATA computes a slight variation of the within estimator. However, it is possible to compute clustered standard errors in R that are identical to those in STATA. The following post gives you a function and the code that replicated STATA's clustered standard errors in R:
Clustered standard errors and robust standard errors