Solved – Test Excluded Instrument in R PLM package

econometricsinstrumental-variablesplmr

I have to run a regression to estimate the impact of an endogenous variable.

There are many packages for this in R (e.g., AER, sem, etc). Since I have an unbalanced panel, in order to obtain a robust estimation of standard errors, I need to use the plm package.

Following a example with a balanced panel of 200 observations along 5 years.

set.seed(123)
library(plm)
# independent variable
y <- rnorm(1:1000)
# endogenous variable
x <- runif(1000, 2, 20)
# other regressor
z <- rnorm(1:1000)
# exogenous variable: i.e., instrument
inst <- runif(1000, 1, 10)
# individual fixed effects
id <- rep(1:200,5)
# time fixed effects
year <- Reduce("c",lapply(1:5,function(x) rep(x,200)))
# create database for plm package
lmdb <- cbind(y,x,z,inst,id,year)
p.lmdb <- plm.data(data.frame(lmdb), c("id", "year"))
# IV regression
my_iv_reg <- plm(y~x+z|.-x+inst,data=p.lmdb, model = "within", effect = "twoways", index = c("id","year"))

Using the AER package, in order to obtain the F test for the excluded instrument, I would run

library(AER)
my_iv_reg_AER <- ivreg(y~x+z+as.factor(id)+as.factor(year)|.-x+inst,data=as.data.frame(lmdb))
summary(my_iv_reg_AER, vcov = sandwich, df = Inf, diagnostics = TRUE)

I was wondering if I can run a similar test to detect a weak instrument using the plm package.

Best Answer

If you only care about the correct estimation of standard errors, and you don't have any other particular reason to use the plm package, you can use the lfe package. Still, you can use the data created with the plm package

library(lfe)
form_iv <- formula(y~z | id + year | x ~ inst)
my_iv_reg <- felm(form_iv, data = p.lmdb)
wald_iv <- waldtest(my_iv_reg$stage1, ~deg_alu, lhs=my_iv_reg$stage1$lhs)

The value of the F-test is the fifth value of the waldtest function, i.e.

wald_iv[5]

While the significance of the F-test is the fourth value of the waldtest function, i.e.,

wald_iv[4]

Note that if you have more than one instrument, e.g. inst_1 and inst_2, and you need to estimate their joint significance, you can run:

form_iv <- formula(y~z | id + year | x ~ inst_1 + inst_2)
my_iv_reg <- felm(form_iv, data = p.lmdb)
wald_iv <- t(sapply(my_iv_reg$stage1$lhs, function(lh){ 
                    waldtest(my_iv_reg$stage1, ~inst_1|inst_2, lhs=lh)} ))