None of those approaches will work properly. Approach 3. came close, but then you said you would prune out insignificant terms. This is problematic because co-linearities make it impossible to find which terms to remove, and because this would give you the wrong degrees of freedom in hypothesis tests if you want to preserve type I error.
Depending on the effective sample size and signal:noise ratio in your problem I'd suggest fitting a model with all product and main effect terms, and interpreting the model using plots and "chunk tests" (multiple d.f. tests of related terms, i.e., a test for overall interaction, test for nonlinear interaction, test for overall effect including main effect + interaction, etc.). The R rms
package makes this easy to do for standard univariate models and for longitudinal models when $Y$ is multivariate normal. Example:
# Fit a model with splines in x1 and x2 and tensor spline interaction surface
# for the two. Model is additive and linear in x3.
# Note that splines typically fit better than ordinary polynomials
f <- ols(y ~ rcs(x1, 4) * rcs(x2, 4) + x3)
anova(f) # get all meaningful hypothesis tests that can be inferred
# from the model formula
bplot(Predict(f, x1, x2)) # show joint effects
plot(Predict(f, x1, x2=3)) # vary x1 and hold x2 constant
When you see the anova
table you'll see lines labeled All Interactions
which for the whole model tests the combined influence of all interaction terms. For an individual predictor this is only helpful when the predictor interacts with more than one variable. There is an option in the print
method for anova.rms
to show by each line in the table exactly which parameters are being tested against zero. All of this works with mixtures of categorical and continuous predictors.
If you want to use ordinary polynomials use pol
instead of rcs
.
Unfortunately I haven't implemented mixed effect models.
tl;dr This is possible, but very tedious (especially for factors with more than 2 levels), and you might not want to do it after all. If you can decide exactly what you mean to test by testing the lower-level interactions, you can probably do it.
This is a little bit tricky partly for technical R-specific reasons, and partly for statistical/inferential reasons.
The statistically tricky part is that testing lower-level interactions in a model that also contains higher-level interactions is (depending on who you talk to) either (i) hard to do correctly or (ii) just plain silly (for the latter position, see part 5 of Bill Venables's "exegeses on linear models". The rubric for this is the principle of marginality. At the very least, the meaning of the lower-order terms depends sensitively on how contrasts in the model are coded (e.g. treatment vs. midpoint/sum-to-zero). My default rule is that if you're not sure you understand exactly why this might be a problem, you shouldn't violate the principle of marginality.
The technically tricky part is that R's formula language doesn't have a simple way to drop lower-order terms in the presence of higher-order interactions (in part because this stuff was all designed originally by researchers from the Nelder/Venables camp that doesn't think this is a sensible thing to do). I have done this in the past, in cases where I knew it was sensible, by constructing the model matrix, dropping the columns I didn't want, and using the remaining terms as explicit variables in the model. In particular, I had terms period
(before vs. after) and ttt
(treatment: control vs. removal) and wanted to measure the period:ttt
interaction but setting the before-period difference between treatments to zero: for the default ("treatment") contrasts in R, this corresponds to setting $\beta_{\textrm{ttt}}$ to zero, or removing that column from the model matrix.
dd <- expand.grid(period=c("before","after"),
ttt=c("control","removal"))
## some trickery to get factor levels in the sensible order
dd[] <- lapply(dd,function(x) factor(x,levels=unique(x)))
You might think that ~period*ttt-period
would be the way to specify the desired result, but it doesn't actually work:
colnames(model.matrix(~period*ttt-ttt,dd))
## [1] "(Intercept)" "periodafter"
## [3] "periodbefore:tttremoval" "periodafter:tttremoval"
(neither does ~period+period:ttt
, or any other sensible combination I could think of). Instead:
X <- model.matrix(~period*ttt,dd)
## also remove the intercept because this will get re-added
X <- X[,!colnames(X) %in% c("(Intercept)","tttremoval")]
colnames(X) <- c("period","period_ttt") ## friendlier names
Now we can fit a model with y~period+period_ttt,data=X
and get what we want.
How does this translate to your case?
the full model is:
lmer(response ~ A*B + (1 + A*B|C), data.frame)
My question concerns getting p-values for the 2-way interactions and the main effect of C.
If you want a p-value for the interaction of A:B
you should probably compare the model above to
update(model, . ~ A + B + (1 + A + B | C))
I'm not sure what the "main effect of C" is -- is that the variation of the intercept among levels
of C
? If so, you probably have to use the trick above to exclude the intercept from A*B
, e.g.
update(model, . ~ A*B + (A2 + B2 + A2_B2 | C))
where A2
, B2
, and A2_B2
are dummy variables as above.
But it's up to you to make sure this makes sense.
Best Answer
Except in very unusual cases, you should not remove an effect and leave-in any effects that contain it. So if you find that the $AB$ interaction is significant, then don't throw out the main effects of $A$ or $B$. Similarly, don't keep $ABC$ and throw out $AB$, or $B$, or any other main effect or two-way interaction involving those three factors.
So, I recommend reading the ANOVA table from the bottom up. You can throw out stuff at the bottom to simplify the model, and work upward, keeping this effect hierarchy in mind.
To see, why, consider a simple illustration, where $A$ and $B$ both have two levels. Let $\mu_{ij}$ denote the expected response at the $i$th level of $A$ and the $j$th level of $B$. If you think that $A$ and $B$ interact, you are saying that the $A$ effects depend on $B$ and vice versa; and this is essentially the same as saying the the combinations of $A$ and $B$ should be regarded as levels of one factor (having 4 levels in this example). As such, the effects of this 4-level factor are quantified as contrasts $c_{11}\mu_{11}+c_{12}\mu_{12} + c_{21}\mu_{21} + c_{22}\mu_{22}$, where the $c_{ij}$ sum to zero. But if you exclude $A$ from the model, you are saying that the $A$ effect, $\{(\mu_{11}+\mu_{12}) - (\mu_{21}+\mu_{22})\}/2$ is equal to zero---thus limiting the contrasts you are willing to consider. Put another way, the four-level factor determined by the levels of $A$ and $B$ together has 3 degrees of freedom, but the interaction effect $AB$ has only 1 d.f.; the other two d.f. are covered by the $A$ and the $B$ main effects.