R – Understanding 0 Intercept in Logistic Regression in R

interceptinterpretationlogisticrregression

I'm exploring the effects of removing the intercept in a logistic regression model.

Assume a model:

$$logit(Y = 1) = \beta_1 x + \beta_2z + 0$$

with $x$ and $z$ being categorical variables with 2 levels each and no intercept.

I understood that having no intercept with categorical predictors produce coefficients that compare the $P(Y = 1)$ in each level of the two predictor against a null case where $P(Y=1) = 0.5$ or $logit(Y=1) = 0$.

I noticed a phenomenon that can understand. Using glm() function in R if you change the order of the variable in the right hand part of the formula, the coefficients change too. But even more oddly, the coefficient of the first variable is always the same.

Here's an R demo:

y <- as.factor(sample(rep(1:2), 30, T))
x <- as.factor(sample(rep(1:2), 30, T))
z <- as.factor(sample(rep(1:2), 30, T))

coef(glm(y ~ x + z - 1, binomial)
#        x1         x2         z2 
#-0.1764783  0.3260739 -0.1335192

coef(glm(y ~ z + x - 1, binomial))
#        z1         z2         x2 
#-0.1764783 -0.3099976  0.5025523

As you can see the first predictors have the same coefficient while the other are different in the two models.

Here is what I expected and instead behave differently than what I though:

Since every level of the two predictors is compared to the same null case, I expected to have the same coefficients in the two models, independently from the order in which I use them.
I expected to see the coefficients of every level of every predictor, instead the coefficient for the 1 level of the second predictor is not shown.
I therefore assume that only the first variable is compared against the null case, while the second is compared against a reference level; but what is this level? Is it $P(Y = 1 | X = 1 \cap Z = 1)$? Reproducing one of the models WITH the intercept we get:
```
coef(glm(y ~ x + z - 1, binomial)
#        x1         x2         z2 
#-0.1764783  0.3260739 -0.1335192

coef(glm(y ~ x + z, binomial))
#(Intercept)         x2          z2 
#-0.1764783   0.5025523  -0.1335192
```

As expected x1 become the intercept, and x2 is likely relative to x1. z1 is missing also in this case and z2 is the same as in the model without intercept.

Thus should I assume that the comparison against the null case $P(Y = 1) = 0.5$ is made only for the first variable in a formula, while the other are compared against the usual intercept?
Is this behavior normal?
What about the fact that the first coefficient has the same value whichever the order of the predictors in the formula?
What if I want to compare all level of each predictor against the null case and have a coefficient for all levels?
Or it's theoretically impossible for some reason I don't get?

Best Answer

The issue is not specific to a GLM. It's an issue of treatment contrasts.

You should also look at the model with intercept:

set.seed(42)
y <- as.factor(sample(rep(1:2), 30, T))
x <- as.factor(sample(rep(1:2), 30, T))
z <- as.factor(sample(rep(1:2), 30, T))

fit0 <- glm(y ~ z + x, binomial)
predict(fit0, newdata=data.frame(z=factor(2), x=factor(1)))
coef(fit0)
#(Intercept)          z2          x2 
# -0.1151303   0.3228803   1.0588217 
predict(fit0, newdata=data.frame(z=factor(2), x=factor(1)))
#      1 
#0.20775

Here the intercept represents the group x1/z1 and the other group means are calculated by adding the coefficients of z2 and/or x2.

fit1 <- glm(y ~ z + x - 1, binomial)
coef(fit1)
#        z1         z2         x2 
#-0.1151303  0.2077500  1.0588217 
predict(fit1, newdata=data.frame(z=factor(2), x=factor(1)))
#      1 
#0.20775

Here the coefficient of z1 represents the group x1/z1 which is the same as the intercept in fit0. However, the coefficient of z2 represents the group x1/z2 instead of the difference between the group means. Note that 0.208 = -0.115 + 0.323. The x2/* group means are calculated by adding the x2 coefficient to the x1/* group means.

It should now be easy to understand why order matters here.

Related Solutions

GLM – Removing Intercept from GLM for Multiple Factorial Predictors Only Works for First Factor in Model

That trick of getting a parameter for each level of the factor by removing the intercept only works when there is only one factor, as you have seen. You can understand why by counting degrees of freedom: Let factor $a$ have $a$ levels, factor $b$ with $b$ levels. Then factor $a$ have $a-1$degrees of freedom, which means that the indicator matrix with $a$ columns representing with, with a $1$ in each row for the level present at that row, has rank $a-1$. Likewise factor $b$ has $b-1$ degrees of freedom. The intercept has one degree of freedom. So the model formula $ ~ a + b$ (which really is $ ~ a + b + 1$) has $1 + a-1 + b-1 = a+b-1$ degrees of freedom. Removing the intercept (model formula $ ~ a + b - 1$) represents the same model, only the parametrization changed. So it must also have $ a + b - 1 $ degrees of freedom. That $-1$ shows that that there cannot be $a+b$ parameters, so one of the factors still must get one parameter less than number of levels.

That explains what you have seen. But still you can get a coefficient for the missing level of $b$, which should be zero, simply. (depending on the contrasts you are using).

To make this a bit more explicit let us see at an example. I will use R for the matrix algebra. To make design matrices (in R parlance "model matrices") from factors, we need to define contrast functions. I use the default:

> options("contrasts")
$contrasts
        unordered           ordered 
"contr.treatment"      "contr.poly"

First we make two factors for a simple, fully crossed design:

a  <- factor(rep(letters[1:3], 3))
b  <- factor(rep(letters[1:3], each=3))

Then design matrices for each of them:

> X1 <- model.matrix( ~ a-1)
> X2 <- model.matrix( ~b-1)
> X1
  aa ab ac
1  1  0  0
2  0  1  0
3  0  0  1
4  1  0  0
5  0  1  0
6  0  0  1
7  1  0  0
8  0  1  0
9  0  0  1
attr(,"assign")
[1] 1 1 1
attr(,"contrasts")
attr(,"contrasts")$a
[1] "contr.treatment"

> X2
  ba bb bc
1  1  0  0
2  1  0  0
3  1  0  0
4  0  1  0
5  0  1  0
6  0  1  0
7  0  0  1
8  0  0  1
9  0  0  1
attr(,"assign")
[1] 1 1 1
attr(,"contrasts")
attr(,"contrasts")$b
[1] "contr.treatment"

Each of them, separately, is of full rank:

library(MASS)
library(Matrix)  
> Matrix::rankMatrix(X1)
[1] 3
attr(,"method")
[1] "tolNorm2"
attr(,"useGrad")
[1] FALSE
attr(,"tol")
[1] 1.998401e-15
> Matrix::rankMatrix(X2)
[1] 3
attr(,"method")
[1] "tolNorm2"
attr(,"useGrad")
[1] FALSE
attr(,"tol")
[1] 1.998401e-15

But when combined there is a rank deficit, so they must have one dimension "in common":

rankMatrix(cbind(X1, X2))
[1] 5
attr(,"method")
[1] "tolNorm2"
attr(,"useGrad")
[1] FALSE
attr(,"tol")
[1] 1.998401e-15

To identify the common dimension we use the Null() function from package MASS, calculating the null space:

 Null(t(cbind(X1, X2)))
           [,1]
[1,] -0.4082483
[2,] -0.4082483
[3,] -0.4082483
[4,]  0.4082483
[5,]  0.4082483
[6,]  0.4082483

Yes, the common dimension is the constant vector.

Best Answer

Related Solutions

GLM – Removing Intercept from GLM for Multiple Factorial Predictors Only Works for First Factor in Model

Related Question