Solved – Why does changing factor level order of a categorical predictor affect significance of continuous predictors in a linear model with interactions

interactionlinear modellme4-nlmemixed modelr

Although I am asking this question using R, I imagine it is potentially applicable more broadly in linear modelling, so I think general explanations are also really helpful.

I have a dataset with a binomial response, two continuous predictors, a categorical predictor, and a random effect of block. I have a model structure like this:

response~cont1 + cont2 + cat + cont1:cont2 + cont1:cat + cont2:cat + (1|block)

After running this model once, I decided that I wanted to reorganize the levels of the categorical predictor so that a different one was alphabetically first. R and lme4 pick the alphabetically-first level of a categorical predictor to incorporate in the intercept, and for the second model, I needed the last one of my levels (alphabetically) to be first.

In looking at the results of both of these models, I noticed that the significance of the main effects of my continuous predictors (as well as their slopes) changed when I altered the order of levels of my categorical factor. My question is: Why does that happen? I would expect that the intercept would change, as well as the pairwise comparisons between the levels of the categorical predictor, but I didn't expect anything to change with my continuous predictors.

After doing some troubleshooting, I have discovered that removing the interaction terms erases any differences in the significance of the predictors between the two factor-level ordering schemes. This makes me think it is something about how the model is parsing the variation in the continuous predictors, but I really have no idea. I also tried the model with a simple lm style linear model, and get the same pattern of results, so I don't think it is specific to GLMMs.

Any thoughts would be very much appreciated. Thanks!

Also, a reproducible example is below. This is using a publicly-available dataset, and please ignore that the models may not be relevant to the data themselves, as I just wrote it quickly to replicate the issue that I am seeing in my data, and didn't pay attention to what the different columns of data actually are.

#Data for example can be downloaded here:
#https://github.com/lme4/lme4/blob/master/inst/testdata/gopherdat2.RData

load("gopherdat2.RData")

#Needed libraries
library(lme4)
library(car)

#I need a categorical predictor, so for the purposes of this model, I will 
use year as categorical
Gdat$yFac<-
ifelse(Gdat$year==2004,"year1",ifelse(Gdat$year==2005,"year2","year3"))

#Model with default organization of categorical predictor
m1<-glmer(shells~density+prev+yFac+density:yFac+prev:yFac+
(1|Site),data=Gdat,family="poisson")

#Anova from the car package to get p-values for the different model terms.
Anova(m1,type="III")

#Making "year2" be the first one alphabetically. 
#I realize that the function stats::relevel does this in a better way, but 
this is what I do so
#that I can keep track of which way I am parameterizing the model.
Gdat$yFaca<-ifelse(Gdat$yFac=="year2","ayear2",paste(Gdat$yFac))

#Model with the new organization of the categorical predictor
m2<-glmer(shells~density+prev+yFaca+density:yFaca+prev:yFaca+
(1|Site),data=Gdat,family="poisson")

#Anova from the car package to get p-values for the different model terms
Anova(m2,type="III")

#Two models to show how it works without interactions
m3<-glmer(shells~density+prev+yFac+(1|Site),data=Gdat,
family="poisson") #Original categorical order
m4<-glmer(shells~density+prev+yFaca+(1|Site),data=Gdat,
family="poisson") #New categorical order

#Running this shows that they are essentially the same.
Anova(m3)
Anova(m4)

Best Answer

Interpretations of coefficients with interactions

The interpretation of the main effect, cont1, after controlling for its interaction with both cont2 and the categorical predictor cat is an average difference in the response for a unit difference in cont1 having cont2=0 and cat=its referent level. By controlling for an interaction between cont1 and cat, you allow the trendline for cont1 to assume different values for the various levels of cat. The main effect is implied to be the stratum-specific trendline for the referent level. As such, the estimate and inference for cont1 is affected by the choice of referent level.

Reasons for the findings you describe

One possible implication in the scenario you describe here is that the interaction between cont1 and cat is non-zero. The interaction term need not be statistically significant to produce this effect; if a sparse level of "cat" is chosen to be the referent level, this aversely affects the precision of the main effects with which "cat" interacts. This sparse level may also be the only level for which the response shows any non-zero mean-level difference from other levels of "cat".

Another possible implication is that cont2 interacts with cat, and due to lack of a three-way interaction, cont2's modified main effect, correlation, and/or interaction with cont1 affects the inference.

A third possibility is heteroscedasticity... I think this may be at play. With mixed models, the residual variance and inference is affected by the structure of covariates in a linear model, and the consequent assumed values of the mixed effects. Changing the estimates of the mixed effects random intercepts terms for block will possibly affect the inference and predictions.

About choosing factor levels

In R, for categorical predictors, the first factor level is chosen to be the referent level. This is often a bad idea for interpretation because many "first" level factor codings are chosen arbitrarily (based on alphabetical order, or first appearance in the data set) or in an informative way (based on minimum value): for instance, education is often measured as less than high school, high school, or some college or more. "Less than high school" is the first factor level, and often the least prevalent in samples. Recoding this value to "high school" (often the mode education) creates more interpretable and precise effects.

Taken together, the solution to your problem seems to be to a) standardized cont1 and cont2 to interpretable and observable ranges so that main effects aren't extrapolated to obtuse levels, and choose a factor coding for "cat" that improves interpretability.