R – Interaction Terms and Higher Order Polynomials

model selectionrregression

If I were interested in fitting two-way interactions between a linear explanatory variable $a$ and another explanatory variable $b$ that has a quadratic relationship with the dependent variable $y$, would I have to include both the interaction with the quadratic component and the interaction with the linear component in the model? E.g.:
$$
y\sim a+b+b^2+ab+ab^2
$$
In turn building on my previous thread:
Curvature terms and model selection, if this were a model selection analysis using MuMIn in R, with many explanatory variables, would the outputted models containing an interaction term that included a quadratic term $a:b^2$ only be valid if the interaction term with the linear component $a:b$ was also present in that same model, as well as $a$, $b$ and $b^2$ as direct effects?

Best Answer

Yes, you should always include all of the terms, from the highest order all the way down to the linear term, in the interaction. There are a couple of really great threads on CV that discuss related issues that you might find helpful in thinking about this:

The short answer is that by not including certain terms in the model, you force parts of it to be exactly zero. This imposes an inflexibility to your model that necessarily causes bias, unless those parameters are exactly zero in reality; the situation is analogous to suppressing the intercept (which you can see discussed here).

You should also be aware that any automatic model selection routine is dangerous. (For the basic story, it may be helpful to read my answer here.) In addition to that, however, these algorithms don't 'think' in terms of the relationships between variables, so they don't necessarily keep lower level terms in the model when power or interaction terms are included.

Related Question