Solved – How are robust standard errors calculated in the case of logistic regression

robust-standard-error

I mean: the Huber/White/sandwich estimator of standard errors. It seems to me that, in the case of continuous outcomes, robust estimators of standard errors are rather simple, given that variance of residuals for each observation is calculated as the squared (estimated) residuals from the regression.
But I can't figure out how this apply to a binary outcome: after estimating a probability for each observation, how can I estimate the variance of residuals for that observation, given it will necessarily have an outcome of 0 or 1, corresponding, in the logit scale, to either $+\infty$ or $-\infty$? In particular, I'm thinking at the case where I have continuous predictors (so I just have one occurence for each value of my set of regressors).

PS: I read some criticisms about the use of robust standard errors for logistic regression, because, if the estimates of variances are biased, then also the parameter estimates themselves are (given average and variance are linked, in the binomial case). However, I'm not wondering about whether robust standard errors make sense here, but simply how they are calculated.

Best Answer

The definition is completely analogous if you use the so-called working residuals and if regressors and residuals are weighted appropriately. The working weights are part of the usual iteratively weighted least squares (IWLS) algorithm employed for generalized linear models (including logistic regression).

The main idea is to use derivative of the log-likelihood with respect to the mean or the corresponding linear predictor as a measure of deviation. Alternatively, you can use the so-called scores or estimating functions as a measure of deviation, i.e., the derivative of the log-likelihood with respect to the coefficients of the linear predictor. All these ideas are employed in our sandwich package for R to enable object-oriented implementations of the different sandwich estimators. See Zeileis (2006, Object-Oriented Computation of Sandwich Estimators, doi:10.18637/jss.v016.i09) for the details.

A worked illustration in R using a logistic regression for labor market participation is:

data("SwissLabor", package = "AER")
m <- glm(participation ~ ., data = SwissLabor, family = binomial)

The classical Eicker/Huber/White sandwich covariance matrix can be computed using our sandwich package:

library("sandwich")
vcov_sandwich <- sandwich(m)

And by hand you can easily extract the working residuals and regressors, appropriately weighted with the working weights:

r <- residuals(m, "working") * sqrt(weights(m, "working"))
X <- model.matrix(m) * sqrt(weights(m, "working"))

Then, the formula for the covariance matrix is just the same as in the linear regression model

vcov_byhand <- solve(t(X) %*% X) %*% t(X) %*% diag(r^2) %*% X %*% solve(t(X) %*% X)
all.equal(vcov_sandwich, vcov_byhand)
## [1] TRUE

Two caveats: (1) Of course, one shouldn't compute the sandwich like this but more efficient and numerically more stable matrix computations can be employed. (2) For data with independent binary responses, these sandwich covariances are not robust against anything. They are consistent, that's ok, but there is no way to misspecify the likelihood without misspecifying the model equation.