Solved – std errors in poisson glm

generalized linear modelpoisson distributionr

With this data:

ds <- data.frame(y=1:10,z=rep(c("A","B"),each=5))

I fit this model:

summary(glm(y ~ z, family = poisson, data = ds))

> Call:
glm(formula = y ~ z, family = poisson, data = ds)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.3427  -0.5515   0.0000   0.4984   1.0527  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)   1.0986     0.2582   4.255 2.09e-05 ***
zB            0.9808     0.3028   3.240   0.0012 ** 
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 16.643  on 9  degrees of freedom
Residual deviance:  4.852  on 8  degrees of freedom
AIC: 42.818

Number of Fisher Scoring iterations: 4

As I understand it the estimate for A is exp(1.0986 - 0.9808) and B is exp(1.0986 + 0.9808) .

If I were doing a barplot of this model I would put the std errors for B as:

 `exp(1.0986 + 0.9808) + or - exp(0.3028)` . 

Would A be:

`exp(1.0986 - 0.9808) + or - exp(0.3028)`  ?

If I was plotting the results as i suggested above the values would be:

results <- read.table(text = " y           z     error
                              1.125019     a     1.353644
                              7.999668     b     1.353644", header = TRUE)

library(ggplot2)

limits <- aes(ymax = y + error, ymin = y - error)
ggplot(results, aes(x=z,y=y)) +
    geom_bar(aes(fill= z)) +
    geom_errorbar(limits)

What should be the correct values in this plot?

Best Answer

When in doubt, simulate:

#some data
set.seed(42)
n <- 1e4
y1 <- rpois(n, 10)
y2 <- rpois(n, 100)

DF <- data.frame(y=c(y1, y2), 
                 z=rep(c("A", "B"), each=n))

#note that I exclude the intercept
mod <- glm(y ~ z-1, family = poisson, data = DF)

coeffs <- exp(coef(mod))
#     A        B 
#9.9845 100.0964

library(ggplot2)
ggplot(DF, aes(x=z, y=y)) + 
  geom_violin() +
  geom_point(data=data.frame(y=coeffs, z=c("A", "B")), color="red", size=5)

enter image description here

See How to obtain those nasty standard errors from transformed data - and why they should not be used regarding obtaining standard errors on the original scale. They also warn that confidence intervals calculated from these with the assumption of symmetry might strongly deviate from the real confidence intervals if the transformation is strongly non-linear.

In our example:

ses <- coeffs*summary(mod)$coef[,2]
#        zA         zB 
#0.03159825 0.10004819

#variance should be equal to mean
(ses*sqrt(1e4))^2
#      zA         zB 
#9.984492 100.096400 

If you don't (or can't) exclude the intercept, things get messy and you'd need some kind of error propagation. But then, it would be better to obtain confidence intervals by other means anyway.