There are three common causes for the zero degrees of freedom:
- Saturated model. These are usually spotted from NAs is
summary(model)
)
- Factor level combinations with zero observations.These are spotted as zero dfs in
Anova(model)
for the interaction effects, but the main effect would still have, e.g., one df.
- Perfectly collinear explanatory variables. These are spotted from the zero dfs for the interaction effects in presense of the main effects in
Anova(model)
, or if only main effects are included the model, some of the main effect could have zero dfs.
Your case falls under the third cause. Judging from the specification of the models 1 and 2, some of your explanatory variables you have added to model 2 (ethnicity, gender, age, enrollment, Pell, TANF, GPA, onlineExp) are collinear with level and type.
For finding out the problematic variables, consult the answer on Stackoverflow. Briefly, command model.matrix(model)
gives out, well, the model matrix for your model. Check the output! Following the advice George Dontas gave on the linked thread, you can then find out which ones are the problematic predictors. Small eigen values from the following command indicate problems:
eigen(cov(model.matrix(model)[,-1]))$values
For a very small R example, try the following:
set.seed(1)
y<-rnorm(9)
x1<-c(0,0,0,1,1,1,2,2,2)
x2<-c(0,1,2,0,1,2,0,1,2)
x3<-c(1,1,1,2,2,2,3,3,3)
fit1<-lm(y~x1+x2)
fit2<-lm(y~x1+x2+x3)
fit3<-lm(y~x1+x2+x3+x1*x3)
library(car)
Anova(fit1)
Anova(fit2) # main effects only
Anova(fit3) # interactions, also
e<-eigen(cov(model.matrix(fit2)[,-1]))$values
names(e)<-colnames(model.matrix(fit2)[,-1]) # x3 creates the problem, the value is very low
Another possibility is to add the new variables one by one, and see when the problem shows itself. You can then try to rectify the situation, e.g., by not adding the problematic variables to the model. Sometimes combining the variables in a suitable way helps, also.
Let's say we want to predict median value of house expressed in thousands of $ (medv
) based on its age, number of rooms (rm
) and the crime rate (crim
) in its neighborhood. The dataset for this is called Boston
and it is in the MASS
package
> library(MASS)
Original regression model
> print(coef(lm(medv~crim+rm+age, data=Boston)))
(Intercept) crim rm age
-23.60556128 -0.21102311 8.03283820 -0.05224283
So we see that the coefficients for those three predictors are -0.211, 8.03 and -0.05. If we want to calculate the price for any house we just use formula:
medv = -0.21102311*crim + 8.03283820*rm - 0.05224283*age
It means that for example, that for increase in one room the medv price rises for 8.03 thousand dollars.
Now, let's say we measure price in the dollars instead of thousands of dollars.
The regression coefficients are totally different.
> Boston.2 <- Boston
> Boston.2$medv <- Boston.2$medv*1000
> print(coef(lm(medv~crim+rm+age, data=Boston.2)))
(Intercept) crim rm age
-23605.56128 -211.02311 8032.83820 -52.24283
They are all scaled versions from the ones in the original model, but what if we measure for example, the crime rate per 1000?
Then there is a big change for the rm and age coefficients.
> Boston.3 <- Boston
> Boston.3$crim <- Boston.3$crim/1000
> print(coef(lm(medv~crim+rm+age, data=Boston.3)))
(Intercept) crim rm age
-23.60556128 -211.02311213 8.03283820 -0.05224283
So compare this model and the original one. If we compare the variables based on those coefficients it will seem that the crime rate is much much more important (-211.02 vs only 8.03) than the number of rooms(which is not realistic).
However, in the original model there is quite a different story. Number of rooms seems way more important as we're measuring variables differently.
To asses the relative importance of variables we first need to perform scaling which will make all the variables comparable. It won't matter anymore what unit is used (crim vs crim/1000) for the data, the coefficients will end up the same. We can then compare their values to see which ones are the most important for prediction.
We now apply the same model three times using the differently scaled data sets (Boston, Boston.2, Boston.3) - note the identical coefficient results:
Original regression model
> print(coef(lm(scale(medv)~scale(crim)+scale(rm)+scale(age), data=Boston)))
(Intercept) scale(crim) scale(rm) scale(age)
-3.076923e-16 -1.973583e-01 6.136725e-01 -1.598956e-01
Regression model with prices in dollars instead of thousands
> print(coef(lm(scale(medv)~scale(crim)+scale(rm)+scale(age), data=Boston.2)))
(Intercept) scale(crim) scale(rm) scale(age)
-3.076923e-16 -1.973583e-01 6.136725e-01 -1.598956e-01
Regression model with crime rate per 1000 people
> print(coef(lm(scale(medv)~scale(crim)+scale(rm)+scale(age), data=Boston.3)))
(Intercept) scale(crim) scale(rm) scale(age)
-3.076923e-16 -1.973583e-01 6.136725e-01 -1.598956e-01
From this it seems that number of rooms is indeed more important than the crime rate, which again is a common sense :).
Best Answer
I think in order to get there you would have to include the factors as main effects themselves. Currently, you only use factors in an interaction term. I am not even sure if this is sound (including an interaction without the corresponding main effects), but I am no expert. Nevertheless, this is why you only get coefficients for the interaction or more specifically for the different contrasts.