Solved – When to use robust standard errors in Poisson regression

poisson distributionrobust

I am using a Poisson regression model for count data and am wondering whether there are reasons not to use the robust standard error for the parameter estimates? I am particularly concerned as some of my estimates without robust are not significant (e.g., p=0.13) but with robust are significant (p<0.01).

In SAS this is available by using the repeated statement in proc genmod (e.g., repeated subject=patid;). I've been using http://www.ats.ucla.edu/stat/sas/dae/poissonreg.htm as an example which cites a paper by Cameron and Trivedi (2009) in support of using robust standard errors.

Best Answer

In general if you have any suspicion that your errors are heteroskedastic, you should use robust standard errors. The fact that your estimates become non-significant when you don't use robust SEs suggests (but does not prove) the need for robust SEs! These SEs are "robust" to the bias that heteroskedasticity can cause in a generalized linear model.

This situation is a little different, though, in that you're layering them on top of Poisson regression.

Poisson has a well known property that it forces the dispersion to be equal to the mean, whether or not the data supports that. Before considering robust standard errors, I would try a Negative Binomial regression, which does not suffer from this problem. There is a test (see the comment) to help determine whether the resultant change in standard errors is significant.

I do not know for sure whether the change you're seeing (moving to robust SEs narrows the CI) implies under-dispersion, but it seems likely. Take a look at the appropriate model (I think negative binomial, but a quick googling also suggests quasi-Poisson for under-dispersion?) and see what you get in that setting.

Related Solutions

Solved – Robust standard errors for panel data vs robust estimation for panel data

They're robust with respect to different things.

If you use robust regression to obtain an estimate of fixed effect in panel data, then you're computing an estimate that's resistant to outliers.

If you use robust standard errors for your OLS estimator, it's because you suspect that the assumption behind your error model is violated. For example, in panel data, your errors may be autocorrelated and not iid, and your robust standard error offers protection against such phenomena.

But, robust standard errors don't guard against outliers, and robust regression doesn't necessarily account for autocorrelation. Though I think there are techniques to provide robust standard errors for robust estimators, you would only use these when both conditions are true: your data have outliers and your error model assumption is violated.

Solved – Robust estimation of Poisson distribution

@cardinal has telegraphed an answer in comments. Let's flesh it out. His point is that although general linear models (such as implemented by lm and, in this case, glmRob) appear intended to evaluate relationships among variables, they can be powerful tools for studying a single variable, too. The trick relies on the fact that regressing data against a constant is just another way of estimating its average value ("location").

As an example, generate some Poisson-distributed data:

set.seed(17)
x <- rpois(10, lambda=2)

In this case, R will produce the vector $(1,5,2,3,2,2,1,1,3,1)$ of values for x from a Poisson distribution of mean $2$. Estimate its location with glmRob:

library(robust)
glmrob(x ~ 1, family=poisson())

The response tells us the intercept is estimated at $0.7268$. Of course, anyone using a statistical method needs to know how it works: when you use generalized linear models with the Poisson family, the standard "link" function is the logarithm. This means the intercept is the logarithm of the estimated location. So we compute

exp(0.7268)

The result, $2.0685$, is comfortably close to $2$: the procedure seems to work. To see what it is doing, plot the data:

plot(x, ylim=c(0, max(x)))
abline(exp(0.7268), 0, col="red")

Plot with fitted line

The fitted line is purely horizontal and therefore estimates the middle of the vertical values: our data. That's all that's going on.

To check robustness, let's create a bad outlier by tacking a few zeros onto the first value of x:

x[1] <- 100

This time, for greater flexibility in post-processing, we will save the output of glmRob:

m <- glmrob(x ~ 1, family=poisson())

To obtain the estimated average we can request

exp(m$coefficients)

The value this time equals $2.496$: a little off, but not too far off, given that the average value of x (obtained as mean(x)) is $12$. That is the sense in which this procedure is "robust." More information can be obtained via

summary(m)

Its output shows us, among other things, that the weight associated with the outlying value of $100$ in x[1] is just $0.02179$, almost $0$, pinpointing the suspected outlier.

Best Answer

Related Solutions

Solved – Robust standard errors for panel data vs robust estimation for panel data

Solved – Robust estimation of Poisson distribution

Related Question