Solved – Factors missing in summary and Anova output from glm in R

generalized linear modelr

I'm running a GLM model in R, but when I add additional variables, the degrees of freedom (df) for two of the variables (level and type, see models below) goes from 1 to 0, and because of this there is no information about them in either the Anova table or the summary output. Interactions involving these two variables however, continue to have one degree of freedom. I can't figure out why the DF goes to zero when I add additional covariates, or what I can do to address this and was hoping someone could point me in the right direction.

The code is below. To try to figure out what is going on, I have run three models:

  1. Model 1 with just a few factors (plus their interactions)
  2. Model 2 with all of the factors (but no interactions)
  3. Model 3 with all of the factors plus the interactions I want

The first model runs fine, with the two variables level and type having one df (and therefore output exists for these two variables in the Anova and summary outputs). But for the second and third models, the two variables level and type each have zero df (although each of their interactions with medium has one df).

modeloutput <- glm(model, data=dataset, family=binomial)
## formulas:
# Model 1:
model <- as.formula(success ~ 1 + medium + level + type + motivation + level*medium +
                   type*medium + motivation*medium)
# Model 2:
model <- as.formula(success ~ 1 + course + medium + level + type + motivation + 
                    ethnicity + gender + age + enrollment + Pell + TANF + GPA + 
                    onlineExp)
# Model 3:
model <- as.formula(success ~ 1 + course + medium + level + type + motivation + 
                    ethnicity + gender + age + enrollment + Pell + TANF + GPA + 
                    onlineExp + level*medium + type*medium + motivation*medium)

I feel like I must be missing something obvious, but I'm just not seeing it. If anyone has any tips about what I can do to find out more information about why this is happening (and about what my options are in trying to address it), I would be very grateful for the help!

Edit to include sample Anova output:

                   LR Chisq Df  Pr(>Chisq)
course             339.1891235  13  1.45477E-64
medium             51.82448562  1   6.06902E-13
ethnicity          66.43311272  4   1.28372E-13
gender             9.557732159  1   0.001991089
age                8.182416401  1   0.004229838
enrollment         0.149034707  1   0.699459539
Pell               1.435801885  1   0.230819897
TANF               13.54535541  1   0.000232867
GPA                179.7209374  3   1.01328E-38
onlineExp          12.78010977  3   0.005137086
level                           0   
type                            0   
motivation         18.00342825  3   0.000439134
medium:level       5.328089832  1   0.020984371
medium:type        5.98948654   1   0.014391391
medium:motivation  6.747183841  3   0.080407666

For the summary output, it looks totally normal, except that none of the values for level or type appear in the list of coefficients.

Best Answer

There are three common causes for the zero degrees of freedom:

  1. Saturated model. These are usually spotted from NAs is summary(model))
  2. Factor level combinations with zero observations.These are spotted as zero dfs in Anova(model) for the interaction effects, but the main effect would still have, e.g., one df.
  3. Perfectly collinear explanatory variables. These are spotted from the zero dfs for the interaction effects in presense of the main effects in Anova(model), or if only main effects are included the model, some of the main effect could have zero dfs.

Your case falls under the third cause. Judging from the specification of the models 1 and 2, some of your explanatory variables you have added to model 2 (ethnicity, gender, age, enrollment, Pell, TANF, GPA, onlineExp) are collinear with level and type.

For finding out the problematic variables, consult the answer on Stackoverflow. Briefly, command model.matrix(model) gives out, well, the model matrix for your model. Check the output! Following the advice George Dontas gave on the linked thread, you can then find out which ones are the problematic predictors. Small eigen values from the following command indicate problems:

eigen(cov(model.matrix(model)[,-1]))$values

For a very small R example, try the following:

set.seed(1)
y<-rnorm(9)
x1<-c(0,0,0,1,1,1,2,2,2)
x2<-c(0,1,2,0,1,2,0,1,2)
x3<-c(1,1,1,2,2,2,3,3,3)
fit1<-lm(y~x1+x2)
fit2<-lm(y~x1+x2+x3)
fit3<-lm(y~x1+x2+x3+x1*x3)
library(car)
Anova(fit1)
Anova(fit2) # main effects only
Anova(fit3) # interactions, also
e<-eigen(cov(model.matrix(fit2)[,-1]))$values
names(e)<-colnames(model.matrix(fit2)[,-1]) # x3 creates the problem, the value is very low

Another possibility is to add the new variables one by one, and see when the problem shows itself. You can then try to rectify the situation, e.g., by not adding the problematic variables to the model. Sometimes combining the variables in a suitable way helps, also.