I'm running a GLM model in R, but when I add additional variables, the degrees of freedom (df) for two of the variables (level and type, see models below) goes from 1 to 0, and because of this there is no information about them in either the Anova table or the summary output. Interactions involving these two variables however, continue to have one degree of freedom. I can't figure out why the DF goes to zero when I add additional covariates, or what I can do to address this and was hoping someone could point me in the right direction.
The code is below. To try to figure out what is going on, I have run three models:
- Model 1 with just a few factors (plus their interactions)
- Model 2 with all of the factors (but no interactions)
- Model 3 with all of the factors plus the interactions I want
The first model runs fine, with the two variables level and type having one df (and therefore output exists for these two variables in the Anova and summary outputs). But for the second and third models, the two variables level and type each have zero df (although each of their interactions with medium has one df).
modeloutput <- glm(model, data=dataset, family=binomial)
## formulas:
# Model 1:
model <- as.formula(success ~ 1 + medium + level + type + motivation + level*medium +
type*medium + motivation*medium)
# Model 2:
model <- as.formula(success ~ 1 + course + medium + level + type + motivation +
ethnicity + gender + age + enrollment + Pell + TANF + GPA +
onlineExp)
# Model 3:
model <- as.formula(success ~ 1 + course + medium + level + type + motivation +
ethnicity + gender + age + enrollment + Pell + TANF + GPA +
onlineExp + level*medium + type*medium + motivation*medium)
I feel like I must be missing something obvious, but I'm just not seeing it. If anyone has any tips about what I can do to find out more information about why this is happening (and about what my options are in trying to address it), I would be very grateful for the help!
Edit to include sample Anova output:
LR Chisq Df Pr(>Chisq) course 339.1891235 13 1.45477E-64 medium 51.82448562 1 6.06902E-13 ethnicity 66.43311272 4 1.28372E-13 gender 9.557732159 1 0.001991089 age 8.182416401 1 0.004229838 enrollment 0.149034707 1 0.699459539 Pell 1.435801885 1 0.230819897 TANF 13.54535541 1 0.000232867 GPA 179.7209374 3 1.01328E-38 onlineExp 12.78010977 3 0.005137086 level 0 type 0 motivation 18.00342825 3 0.000439134 medium:level 5.328089832 1 0.020984371 medium:type 5.98948654 1 0.014391391 medium:motivation 6.747183841 3 0.080407666
For the summary output, it looks totally normal, except that none of the values for level or type appear in the list of coefficients.
Best Answer
There are three common causes for the zero degrees of freedom:
summary(model)
)Anova(model)
for the interaction effects, but the main effect would still have, e.g., one df.Anova(model)
, or if only main effects are included the model, some of the main effect could have zero dfs.Your case falls under the third cause. Judging from the specification of the models 1 and 2, some of your explanatory variables you have added to model 2 (ethnicity, gender, age, enrollment, Pell, TANF, GPA, onlineExp) are collinear with level and type.
For finding out the problematic variables, consult the answer on Stackoverflow. Briefly, command
model.matrix(model)
gives out, well, the model matrix for your model. Check the output! Following the advice George Dontas gave on the linked thread, you can then find out which ones are the problematic predictors. Small eigen values from the following command indicate problems:For a very small R example, try the following:
Another possibility is to add the new variables one by one, and see when the problem shows itself. You can then try to rectify the situation, e.g., by not adding the problematic variables to the model. Sometimes combining the variables in a suitable way helps, also.