Regression – Understanding Strange Standard Errors from glm() in R

generalized linear modelinterceptlogisticregressionstandard error

To my surprise I found that standard errors and thus Wald confidence intervals became smaller when I removed the intercept from a simple logistic regression model, using glm() and R.

# load an object named "my.df" to the global enviroment
load(url("http://hansekbrand.se/code/test.RData"))
# fit a model with intercept to data
my.fit <- glm(deprived.of.education ~ religion, data = my.df, family = binomial("logit"))
# fit a model without any intercept to data
my.fit.without.intercept <- glm(deprived.of.education ~ 0 + religion, data = my.df, family = binomial("logit"))

# inspect the first fit
summary(my.fit)$coefficients
#                        Estimate Std. Error   z value     Pr(>|z|)
# (Intercept)          -2.8718056 0.03175130 -90.44687 0.000000e+00
# religionChristianity  0.4934891 0.03234887  15.25522 1.519805e-52
# religionHinduism      0.5257316 0.03376535  15.57015 1.161317e-54
# religionIslam         1.5734832 0.03231692  48.68914 0.000000e+00
# religionNonreligious  1.5975456 0.03555164  44.93592 0.000000e+00

# inspect the second fit
summary(my.fit.without.intercept)$coefficients
#                       Estimate  Std. Error    z value Pr(>|z|)
# religionBuddhism     -2.871806 0.031751299  -90.44687        0
# religionChristianity -2.378317 0.006189045 -384.27842        0
# religionHinduism     -2.346074 0.011487113 -204.23530        0
# religionIslam        -1.298322 0.006019850 -215.67354        0
# religionNonreligious -1.274260 0.015992939  -79.67642        0

I understand why the z values are different, because the null hypotheses in the two cases are different. In the first case, with the intercept, the null is "same as the reference category", while without the intercept, the null becomes "zero".
But I do not understand the large difference in standard errors between the two models.

Without the intercept, the standard errors seem to vary with n of each level, i.e. there are many cases of "Christianity" and "Islam", and they have small standard errors, but with the intercept, there is essentially no variation in the standard errors.

Could someone please explain the reason for the differences in the magnitude of the standard errors between the two models?

I would like to calculate probabilities and confidence intervals around them, and I have done so using the estimates from the first model. If I would do that with the estimates from the second model, the confidence intervals would be much smaller, but would they be reliable?

Best Answer

Your coefficients, even when they share common names, are not the same, i.e. their interpretation is different.

In the first model, the effect of religionChristianity is a variation in the outcome wrt the baseline (religionBuddhism), a relative variation. In the second model the effect of religionChristianity is an absolute variation.

The effects are numerically equal, $-2.8718056+0.4934891=-2.378317+5e-07$, but in the first case the effect is a sum of two effects, i.e. you should compare the joint significance of Intercept and religionChristianity in the first model with the significance of religionChristianity in the second one. You should compare a joint confidence interval (first model) with a simple one (second model).

The simple CI for religionChristianity is:

> confint(my.fit.without.intercept)
Waiting for profiling to be done...
                         2.5 %    97.5 %
...
religionChristianity -2.390467 -2.366207

There are several ways to compute joint intervals. Using arm:

> library(arm)
> n.sims <- 1000
> sim.i <- sim(my.fit, n.sims)
> intercept.plus.christianity <- sim.i@coef[,1] + sim.i@coef[,2]
> quantile(intercept.plus.christianity, c(0.025, 0.975))
     2.5%     97.5% 
-2.390826 -2.366828

Can you see any significant (relevant) difference?

Related Solutions

Solved – How to calculate standard errors for GLMs fitted values “by-hand”, without using predicted() in R

To get the standard errors, you could either approximate them with the delta method, or just use simulation! You can probably use the simulate() generic in R.

For help on how to use the delta method, have a look at the answer to Calculating the Variance using Delta Method

Following, an example on how to get the standard errors from the model object by simulation. I give an Poisson example, where we know the standard error from the mean, so as to be able to check on the simulations. Code:

set.seed(7*11*13)  
n  <-  100
N  <-  1000 # number of replications
y  <-  rpois(n, lambda=exp(5+x))
mod  <-  glm(y~x, family=poisson())
sims  <-  simulate(mod, nsim=N)  # this is sims from the estimated predictive distribution
preds  <-  matrix(0,n,N)
for (i in seq(along=sims)) {
    preds[,i]  <-  sims[[i]]
}
means  <-  apply(preds,1,mean)
vars   <-  apply(preds,1,var)
plot(means,vars)

the plot showing the expected linear relation.

Mixed Model Standard Error – Standard Errors in LME4 Linear Mixed Models

By default in R, treatment contrasts are used for factors. This means that what you get in the output from summary(mod) are the differences from the reference level for treatment. E.g., 37.4 is the difference between treatment B and treatment A.

If you want to get the mean for treatment B, you will need to add the coefficients. For the standard errors, you also need to account for the covariance between the estimates of the fixed effects. The following code illustrates how this is done (which essentially what effects and emmeans do under the hood):

coefs <- fixef(mod)
V <- vcov(mod)

# mean and std. error for treatment B
DF <- data.frame(treatment = factor("B", levels = LETTERS[1:3]))
X <- model.matrix(~ treatment, data = DF)
c(X %*% coefs)
sqrt(diag(X %*% V %*% t(X)))


# mean and std. error for treatment C
DF <- data.frame(treatment = factor("C", levels = LETTERS[1:3]))
X <- model.matrix(~ treatment, data = DF)
c(X %*% coefs)
sqrt(diag(X %*% V %*% t(X)))

Best Answer

Related Solutions

Solved – How to calculate standard errors for GLMs fitted values “by-hand”, without using predicted() in R

Mixed Model Standard Error – Standard Errors in LME4 Linear Mixed Models

Related Question