R Panel Data – Why Different Standard Errors When Using Demeaned Data in -plm-?

fixed-effects-modelpanel datarstandard error

I am fitting a Fixed-Effects model, with intercepts at cluster level.

One of the most direct ways is probably to use the -plm- package. Another well-known possibility is to apply OLS (i.e. to adopt -lm-) to the demeaned data, where the means are taken at the clustering level.

This second approach is usually referred to as the within transformation. It is quite convenient from a computational standpoint, because we are still controlling unobserved heterogeneity at clustering level, but we do not need to estimate all the time-fixed intercepts.

I have tried both of these approaches, and I came to a strange result. In practice, the coefficient of the regressor of interest, x, is the same in both cases. However, its standard error (and actually all the other relevant quantities of the regression: R squared, F test, etc.) is different.

Please, notice that I have carefully read both the R documentation about -plm- and the related paper of the authors, where it is stated that the package apply the within transformation and then apply OLS, as I did…

The R script is:

# set seed, load packages, create fake sample


dat <- expand.grid(id=factor(1:3), cluster=factor(1:6))
dat <- cbind(dat, x=runif(18), y=runif(18, 2, 5))

#   FE model using -plm-   #

# model fit  
fe.1 <- plm(y ~ x, data=dat, index="cluster", model="within")

# estimated coefficient and standard error of x
b.1 <- summary(fe.1)$coefficients[,1]
    se.1 <- summary(fe.1)$coefficients[,2]

#   OLS on within-transformed data   #

# augmenting data frame with cluster-mean centered variables 
dat.2 <- ddply(dat, .(cluster), transform, dem_x=x-mean(x), dem_y=y-mean(y))

# model fit
fe.2 <- lm(dem_y ~ dem_x - 1, data=dat.2)

# estimated coefficient and standard error of x
b.2 <- summary(fe.2)$coefficients[1,1]
    se.2 <- summary(fe.2)$coefficients[1,2]

#   models comparison   #

b.1; b.2
se.1; se.2


Notice that in the second model it is necessary to manually eliminate the intercept from the model.

Best Answer

If you look carefully at the output, you'll notice that the degrees of freedom are different. The degrees of freedom is used to compute the standard errors, thus they are wrong for your demeaned lm. When you apply lm to the demeaned data, lm does not know that the means have been subtracted, or equivalently, that you have eliminated the dummies for the cluster levels. If you include the dummies, as in

summary(lm(y ~ x + cluster,data=dat))

the degrees of freedom are accounted for.

Related Question