GLM – Removing Intercept from GLM for Multiple Factorial Predictors Only Works for First Factor in Model

binomial distributioncategorical datacategorical-encodinggeneralized linear modelintercept

I am running a binomial logistic regression with a logit link function in R. My response is factorial [0/1] and I have two multilevel factorial predictors – let's call them $a$ and $b$ where $a$ has 4 factor levels $(a_1,a_2,a_3,a_4)$ and $b$ has 9 factor levels $(b_1,b_2,\dotsc,b_9)$. Therefore:

mod <- glm(y ~ a + b, family = binomial(logit), data=pretend)
summary(mod)

The model output would then show all the information about the model as well as the coefficients.

There is a factor level for both a and b missing (a1 and b1) from the summary output. I understand that it is constrained in the "intercept" of the model.
I have read that if I want to remove the intercept term and see the estimates for those factor levels I can just add -1 to the model formula, i.e.:

mod2 <- glm(y ~ a + b - 1, family=binomial(logit), data=pretend)
summary(mod2)

In the new model (mod2) the intercept term is then gone and variable a's factor-level a1 is given amongst the list of coefficients. But, variable b's factor-level b1 is still missing and given that there is no intercept term anymore, how can I interpret the odds-ratio for that factor level then?

Could someone please explain to me how to get the coefficient for b1 too and why this is happening?

Best Answer

That trick of getting a parameter for each level of the factor by removing the intercept only works when there is only one factor, as you have seen. You can understand why by counting degrees of freedom: Let factor $a$ have $a$ levels, factor $b$ with $b$ levels. Then factor $a$ have $a-1$degrees of freedom, which means that the indicator matrix with $a$ columns representing with, with a $1$ in each row for the level present at that row, has rank $a-1$. Likewise factor $b$ has $b-1$ degrees of freedom. The intercept has one degree of freedom. So the model formula $ ~ a + b$ (which really is $ ~ a + b + 1$) has $1 + a-1 + b-1 = a+b-1$ degrees of freedom. Removing the intercept (model formula $ ~ a + b - 1$) represents the same model, only the parametrization changed. So it must also have $ a + b - 1 $ degrees of freedom. That $-1$ shows that that there cannot be $a+b$ parameters, so one of the factors still must get one parameter less than number of levels.

That explains what you have seen. But still you can get a coefficient for the missing level of $b$, which should be zero, simply. (depending on the contrasts you are using).

To make this a bit more explicit let us see at an example. I will use R for the matrix algebra. To make design matrices (in R parlance "model matrices") from factors, we need to define contrast functions. I use the default:

> options("contrasts")
$contrasts
        unordered           ordered 
"contr.treatment"      "contr.poly" 

First we make two factors for a simple, fully crossed design:

a  <- factor(rep(letters[1:3], 3))
b  <- factor(rep(letters[1:3], each=3))

Then design matrices for each of them:

> X1 <- model.matrix( ~ a-1)
> X2 <- model.matrix( ~b-1)
> X1
  aa ab ac
1  1  0  0
2  0  1  0
3  0  0  1
4  1  0  0
5  0  1  0
6  0  0  1
7  1  0  0
8  0  1  0
9  0  0  1
attr(,"assign")
[1] 1 1 1
attr(,"contrasts")
attr(,"contrasts")$a
[1] "contr.treatment"

> X2
  ba bb bc
1  1  0  0
2  1  0  0
3  1  0  0
4  0  1  0
5  0  1  0
6  0  1  0
7  0  0  1
8  0  0  1
9  0  0  1
attr(,"assign")
[1] 1 1 1
attr(,"contrasts")
attr(,"contrasts")$b
[1] "contr.treatment"

Each of them, separately, is of full rank:

library(MASS)
library(Matrix)  
> Matrix::rankMatrix(X1)
[1] 3
attr(,"method")
[1] "tolNorm2"
attr(,"useGrad")
[1] FALSE
attr(,"tol")
[1] 1.998401e-15
> Matrix::rankMatrix(X2)
[1] 3
attr(,"method")
[1] "tolNorm2"
attr(,"useGrad")
[1] FALSE
attr(,"tol")
[1] 1.998401e-15

But when combined there is a rank deficit, so they must have one dimension "in common":

rankMatrix(cbind(X1, X2))
[1] 5
attr(,"method")
[1] "tolNorm2"
attr(,"useGrad")
[1] FALSE
attr(,"tol")
[1] 1.998401e-15

To identify the common dimension we use the Null() function from package MASS, calculating the null space:

 Null(t(cbind(X1, X2)))
           [,1]
[1,] -0.4082483
[2,] -0.4082483
[3,] -0.4082483
[4,]  0.4082483
[5,]  0.4082483
[6,]  0.4082483

Yes, the common dimension is the constant vector.

Related Question