The ANOVA in linear regression models is equivalent to the Wald test (and the likelihood ratio test) of the corresponding nested models. So when you want to conduct the corresponding test using heteroskedasticity-consistent (HC) standard errors, this cannot be obtained from a decomposition of the sums of squares but you can carry out the Wald test using a HC covariance estimate. This idea is used in both Anova()
and linearHypothesis()
from the car
package and coeftest()
and waldtest()
from the lmtest
package. The latter three can also be used with plm
objects.
A simple (albeit not very interesting/meaningful) example is the following. We use the standard model from the ?plm
manual page and want to carry out a Wald test for the significance of both log(pcap)
and unemp
. We need these packages:
library("plm")
library("sandwich")
library("car")
library("lmtest")
The model (under the alternative) is:
data("Produc", package = "plm")
mod <- plm(log(gsp) ~ log(pc) + log(emp) + log(pcap) + unemp,
data = Produc, index = c("state", "year"))
First, let's look at the marginal Wald tests with HC standard errors for all individual coefficients:
coeftest(mod, vcov = vcovHC)
t test of coefficients:
Estimate Std. Error t value Pr(>|t|)
log(pc) 0.2920069 0.0617425 4.7294 2.681e-06 ***
log(emp) 0.7681595 0.0816652 9.4062 < 2.2e-16 ***
log(pcap) -0.0261497 0.0603262 -0.4335 0.66480
unemp -0.0052977 0.0024958 -2.1226 0.03411 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
And then we carry out a Wald test for both log(pcap)
and unemp
:
linearHypothesis(mod, c("log(pcap)", "unemp"), vcov = vcovHC)
Linear hypothesis test
Hypothesis:
log(pcap) = 0
unemp = 0
Model 1: restricted model
Model 2: log(gsp) ~ log(pc) + log(emp) + log(pcap) + unemp
Note: Coefficient covariance matrix supplied.
Res.Df Df Chisq Pr(>Chisq)
1 766
2 764 2 7.2934 0.02608 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Alternatively, we can also fit the model under the null hypothesis (mod0
say) without the two coefficients and then call waldtest()
:
mod0 <- plm(log(gsp) ~ log(pc) + log(emp),
data = Produc, index = c("state", "year"))
waldtest(mod0, mod, vcov = vcovHC)
Wald test
Model 1: log(gsp) ~ log(pc) + log(emp)
Model 2: log(gsp) ~ log(pc) + log(emp) + log(pcap) + unemp
Res.Df Df Chisq Pr(>Chisq)
1 766
2 764 2 7.2934 0.02608 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The test statistic and p-value computed by linearHypothesis()
and waldtest()
is exactly the same. Just the interface and output formatting is somewhat different. In some cases one or the other is simpler or more intuitive.
Note: If you supply a covariance matrix estimate (i.e., a matrix like vocvHC(mod)
) instead of a covariance matrix estimator (i.e., a function like vocvHC
), make sure that you supply the HC covariance matrix estimate of the model under the alternative, i.e., the non-restricted model.
You can use twang
with more than 2 treatment levels -- I use it all the time to obtain propensity scores for multiple (i.e. >2) treatments and it's one of my all time favorite R packages because there is no need to guess the functional relationships between your treatments and covariates. Since twang
uses gradient boosted regression, it can fit nonlinear relationships and interactions automatically. All you need to do is assess the balance statistics after obtaining the propensity scores using the get.weights
function. Using the mnps
function in twang instead of the ps
function, you can obtain propensity score weights for multiple treatments. The good folks at RAND have even prepared a nice tutorial for multiple treatment propensity score weighting.
One word of caution is in order however with twang
. It isn't the most efficiently written program and it cannot be easily parrallelized, so if you have a lot of data (e.g. > 100,000 observations), it can be quite slow. Also, sometimes weights can be assigned which are too large and therefore influence your results too much. I perform a simple check on the weights to make sure that no weight is overly large (e.g. no weight accounts for more than say 5% of the total of all weights). Note too that the twang propensity scores weights are not standardized so they do not add to one. They can be through some manipulation, but I rarely find that I need to do this.
Lastly, work by Elizabeth Stuart has shown propensity scores built with gradient boosting methods outperforms other methods. Therefore, I'd strongly advocate the use of twang.
Best Answer
Lunceford and Davidian (2004) derive the asymptotic standard errors for IPW estimators. These rely on a generalized estimating equations approach and assume the propensity scores are estimated using a method that can be represented as a system of estimating equations (e.g., logistic regression, but not random forests). Their proof also indicates that IPW estimators are smooth and asymptotically normal, making them amenable to bootstrapping. They also find that excluding the propensity score estimation from the estimating equations and treating the weights as fixed yields conservative estimates of the standard error. This is equivalent to using a robust standard error in the outcome regression model.
This leads to three ways to validly estimate the standard error in IPW:
geex
in R, and some R packages likePSweight
can also compute them. In SAS,PROC CAUSALTRT
automatically computes the correct standard errors, and in Stata,teffects ipw
uses the same approach.survey::vcovHC()
after aglm()
orlm()
call with the outcome model,survey::svyglm()
, which is recommended in thetwang
andWeightIt
documentation, orgeepack::geeglm()
as recommended by HernĂ¡n and Robins (2020). In SAS, you would usePROC SURVEYREG
, and in Stata you would use supply the weights to theaweights
argument in any regression model, which automatically requests robust standard errors.boot
package.I am most inclined to use robust standard errors because of their flexibility and ease of use. For a serious project where conservative standard errors could be a liability and I had a lot of time, I would use the bootstrap.