Solved – Is it better to remove higher order interactions or least significant terms first in model simplification

interactionmixed modelmodel selection

I have a mixed effects model with 3 explanatory factors and a full interaction set (including 3 way interaction). This is the full model. Factor 1 is time and I am interested in the change in the response variable over time. Therefore, only factors or interactions where time is present are biologically relevant.

My questions are:
1) Is it better to begin model simplification by removing the highest order (3-way) interaction first, or by removing the least significant term first (which in this case is factor 2)? The 3 way interaction does actually have a relevant interpretation. I have seen both options proposed.

2)Is it okay to include interactions but remove their basic factors (IE include time x treatment when treatment alone is removed from the model)? Several of the basic factors make no sense when not interacting with time. IE measuring the effect of treatment when you are excluding time doesn't make much sense since starting conditions inside each treatment level are identical.

If my hypothesis is correct the final model should include several factor interactions but should not include basic factor 2 or 3 (as they make no sense in the absence of the time factor).

Thank you for your help. I am happy to clarify if my question is ambiguous.

Best Answer

Except in very unusual cases, you should not remove an effect and leave-in any effects that contain it. So if you find that the $AB$ interaction is significant, then don't throw out the main effects of $A$ or $B$. Similarly, don't keep $ABC$ and throw out $AB$, or $B$, or any other main effect or two-way interaction involving those three factors.

So, I recommend reading the ANOVA table from the bottom up. You can throw out stuff at the bottom to simplify the model, and work upward, keeping this effect hierarchy in mind.

To see, why, consider a simple illustration, where $A$ and $B$ both have two levels. Let $\mu_{ij}$ denote the expected response at the $i$th level of $A$ and the $j$th level of $B$. If you think that $A$ and $B$ interact, you are saying that the $A$ effects depend on $B$ and vice versa; and this is essentially the same as saying the the combinations of $A$ and $B$ should be regarded as levels of one factor (having 4 levels in this example). As such, the effects of this 4-level factor are quantified as contrasts $c_{11}\mu_{11}+c_{12}\mu_{12} + c_{21}\mu_{21} + c_{22}\mu_{22}$, where the $c_{ij}$ sum to zero. But if you exclude $A$ from the model, you are saying that the $A$ effect, $\{(\mu_{11}+\mu_{12}) - (\mu_{21}+\mu_{22})\}/2$ is equal to zero---thus limiting the contrasts you are willing to consider. Put another way, the four-level factor determined by the levels of $A$ and $B$ together has 3 degrees of freedom, but the interaction effect $AB$ has only 1 d.f.; the other two d.f. are covered by the $A$ and the $B$ main effects.

Related Solutions

Solved – How should I model interactions between explanatory variables when one of them may have quadratic and cubic terms

None of those approaches will work properly. Approach 3. came close, but then you said you would prune out insignificant terms. This is problematic because co-linearities make it impossible to find which terms to remove, and because this would give you the wrong degrees of freedom in hypothesis tests if you want to preserve type I error.

Depending on the effective sample size and signal:noise ratio in your problem I'd suggest fitting a model with all product and main effect terms, and interpreting the model using plots and "chunk tests" (multiple d.f. tests of related terms, i.e., a test for overall interaction, test for nonlinear interaction, test for overall effect including main effect + interaction, etc.). The R rms package makes this easy to do for standard univariate models and for longitudinal models when $Y$ is multivariate normal. Example:

# Fit a model with splines in x1 and x2 and tensor spline interaction surface
# for the two.  Model is additive and linear in x3.
# Note that splines typically fit better than ordinary polynomials
f <- ols(y ~ rcs(x1, 4) * rcs(x2, 4) + x3)
anova(f)   # get all meaningful hypothesis tests that can be inferred
           # from the model formula
bplot(Predict(f, x1, x2))    # show joint effects
plot(Predict(f, x1, x2=3))   # vary x1 and hold x2 constant

When you see the anova table you'll see lines labeled All Interactions which for the whole model tests the combined influence of all interaction terms. For an individual predictor this is only helpful when the predictor interacts with more than one variable. There is an option in the print method for anova.rms to show by each line in the table exactly which parameters are being tested against zero. All of this works with mixtures of categorical and continuous predictors.

If you want to use ordinary polynomials use pol instead of rcs.

Unfortunately I haven't implemented mixed effect models.

Solved – Testing significance of random effects in a linear mixed-effects model with interactions

tl;dr This is possible, but very tedious (especially for factors with more than 2 levels), and you might not want to do it after all. If you can decide exactly what you mean to test by testing the lower-level interactions, you can probably do it.

This is a little bit tricky partly for technical R-specific reasons, and partly for statistical/inferential reasons.

The statistically tricky part is that testing lower-level interactions in a model that also contains higher-level interactions is (depending on who you talk to) either (i) hard to do correctly or (ii) just plain silly (for the latter position, see part 5 of Bill Venables's "exegeses on linear models". The rubric for this is the principle of marginality. At the very least, the meaning of the lower-order terms depends sensitively on how contrasts in the model are coded (e.g. treatment vs. midpoint/sum-to-zero). My default rule is that if you're not sure you understand exactly why this might be a problem, you shouldn't violate the principle of marginality.

The technically tricky part is that R's formula language doesn't have a simple way to drop lower-order terms in the presence of higher-order interactions (in part because this stuff was all designed originally by researchers from the Nelder/Venables camp that doesn't think this is a sensible thing to do). I have done this in the past, in cases where I knew it was sensible, by constructing the model matrix, dropping the columns I didn't want, and using the remaining terms as explicit variables in the model. In particular, I had terms period (before vs. after) and ttt (treatment: control vs. removal) and wanted to measure the period:ttt interaction but setting the before-period difference between treatments to zero: for the default ("treatment") contrasts in R, this corresponds to setting $\beta_{\textrm{ttt}}$ to zero, or removing that column from the model matrix.

dd <- expand.grid(period=c("before","after"),
                  ttt=c("control","removal"))
## some trickery to get factor levels in the sensible order
dd[] <- lapply(dd,function(x) factor(x,levels=unique(x)))

You might think that ~period*ttt-period would be the way to specify the desired result, but it doesn't actually work:

colnames(model.matrix(~period*ttt-ttt,dd))
## [1] "(Intercept)"             "periodafter"            
## [3] "periodbefore:tttremoval" "periodafter:tttremoval"

(neither does ~period+period:ttt, or any other sensible combination I could think of). Instead:

X <- model.matrix(~period*ttt,dd)
## also remove the intercept because this will get re-added
X <- X[,!colnames(X) %in% c("(Intercept)","tttremoval")]
colnames(X) <- c("period","period_ttt")  ## friendlier names

Now we can fit a model with y~period+period_ttt,data=X and get what we want.

How does this translate to your case?

the full model is:

lmer(response ~ A*B + (1 + A*B|C), data.frame)

My question concerns getting p-values for the 2-way interactions and the main effect of C.

If you want a p-value for the interaction of A:B you should probably compare the model above to

update(model, . ~ A + B + (1 + A + B | C))

I'm not sure what the "main effect of C" is -- is that the variation of the intercept among levels of C ? If so, you probably have to use the trick above to exclude the intercept from A*B, e.g.

update(model, . ~ A*B + (A2 + B2 + A2_B2 | C))

where A2, B2, and A2_B2 are dummy variables as above.

But it's up to you to make sure this makes sense.

Best Answer

Related Solutions

Solved – How should I model interactions between explanatory variables when one of them may have quadratic and cubic terms

Solved – Testing significance of random effects in a linear mixed-effects model with interactions

Related Question