Solved – Standard Errors with Weighted Least Squares Regression

regressionrobust-standard-errorsandwichstandard errorwls

For OLS, $\hat{\beta} = (X'X)^{-1}X'y$, and $\text{var}(\hat{\beta}) = (X'X)^{-1} X' \sigma^2 I X (X'X)^{-1}$. I can reproduce these "by hand".

For WLS, with heteroskedastic errors and weights in diagonal $W$, $\hat{\beta} = (X'WX)^{-1}X'Wy$, and

$\text{var}(\hat{\beta}) = (X'WX)^{-1} \hspace{2 pt} X'W \hspace{2 pt} \text{diag}(\sigma^2_i) \hspace{2 pt} WX \hspace{2 pt} (X'WX)^{-1}$. I can also reproduce these "by hand".

Sandwich standard errors act on the variance estimates by substitututing estimates for $\sigma^2_i$. For example for HC0 (Zeiles 2004 JSS) the squared residuals are used. I can also reproduce these "by hand" both for OLS and WLS (see code below).

However, summary(lm_wls) produces an SE estimate of 0.25081. I cannot reproduce this estimate "by hand". See below at the "### <– HERE" arrow for the spot. What gets plugged in for "diag(sig^2_i)" there?

###########################################################
### DATA GENERATION #######################################
###########################################################
set.seed(1234)
# Generate a covariate
x <- rnorm(100)
# Generate the propensity score
ps <- (1 + exp(-(-.5 + .8*x)))^-1
# Generate the exposure (i.e., treatment) variable
z <- rbinom(n = 100, size = 1, prob = ps)
# Generate the outcome
y <- 1.1*x - .6*z + rnorm(100, sd = .5)

### Estimate the average treatment effect by OLS
###########################################################
### ORDINARY LEAST SQUARES REGRESSION #####################
###########################################################
### OLS via lm()
lm_ols <- lm(formula = y ~ x + z)
summary(lm_ols)

### Verify OLS 'by hand'
X <- cbind(1, x, z)
(betas <- solve(t(X) %*% X) %*% t(X) %*% y)
(resid_var <- sqrt(sum(lm_ols$residuals^2)/(100 - 3)))
(var_betas <- solve(t(X) %*% X) %*% 
              (t(X) %*% diag(resid_var^2, 100, 100) %*% X)  
              %*% solve(t(X) %*% X))
sqrt(diag(var_betas)) # SEs -- Compare to summary(lm_ols)

### Sandwich SEs based on squared residuals 
### (see p.4 of Zeiles 2004 JSS)
library(sandwich)
(vcovHC(x = lm_ols, type = "HC0", sandwich = TRUE))
sqrt(diag(vcovHC(x = lm_ols, type = "HC0", sandwich = TRUE)))

### Verify Sandwich SEs for the OLS model
var_betas <- solve(t(X) %*% X) %*% 
             t(X) %*% diag(lm_ols$residuals^2) %*% X %*% 
             solve(t(X) %*% X)
sqrt(diag(var_betas)) # SEs -- Compare to sandwich SEs

### Estimate the average treatment effect by propensity
### score weighting
###########################################################
### INVERSE PROPENSITY SCORE WEIGHTING ####################
###########################################################
### Estimate propensity scores
glm1 <- glm(z ~ x, family = "binomial")
ps_est <- predict(glm1, type = "response")
### Create inverse ps weights
wts <- (z/ps_est) + (1-z)/(1-ps_est)
### Estimate average treatment effect via WLS
lm_wls <- lm(formula = y ~ z, weights = wts)
summary(lm_wls)

### Verify WLS
X <- cbind(1, z)
W <- diag(wts)
(betas2 <- solve(t(X) %*% W %*% X) %*% t(X) %*% W %*% y)
(resid_var2 <- sqrt(sum(wts*(lm_wls$residuals^2))/(100 - 2)))
(var_betas2 <- solve(t(X) %*% W %*% X) %*% 
           (t(X) %*% W %*% "diag(sig^2_i)" %*% t(W) %*% X) %*%  ### <-- HERE
           solve(t(X) %*% W %*% X))
sqrt(diag(var_betas2)) # SEs

### Sandwich SEs with sandwich package
library(sandwich)
vcovHC(x = lm_wls, type = "HC0", sandwich = TRUE)
sqrt(diag(vcovHC(x = lm_wls, type = "HC0", sandwich = TRUE)))

### Verify Sandwich SEs for the WLS model
(var_betas3 <- solve(t(X) %*% W %*% X) %*% 
           (t(X) %*% W %*% diag(lm_wls$resid^2) %*% 
           t(W) %*% X) %*% solve(t(X) %*% W %*% X))
sqrt(diag(var_betas3)) # SEs -- Compare to sandwich SEs 

Best Answer

Essentially you already computed everything you need. The missing piece is just that the sig_i should be the residual standard error divided by the corresponding square root of the weight. In OLS this isn't necessary because all weights are 1.

sig_i <- resid_var2 / sqrt(wts)
var_betas2 <- solve(t(X) %*% W %*% X) %*% (t(X) %*% W %*% diag(sig_i^2) %*% t(W) %*% X) %*% solve(t(X) %*% W %*% X)

And then you get:

sqrt(diag(var_betas2))
##                   z 
## 0.1760843 0.2508150 

which matches the output of summary() and vcov():

sqrt(diag(vcov(lm_wls)))
## (Intercept)           z 
##   0.1760843   0.2508150 

Even more familiar might be the equations as $\hat \sigma^2 (X^\top W X)^{-1}$ where the terms from the full sandwich (= bread * meat * bread) have already been simplified to just the bread:

var_betas2a <- resid_var2^2 * solve(t(X) %*% W %*% X)
sqrt(diag(var_betas2a))
##                   z 
## 0.1760843 0.2508150