Regression Modelling – Interaction Between Quadratic Term and Dummy Variables

categorical-encodinginteractionlinearquadratic formregression

Suppose I have a linear regression:

$Y=\beta_1+\beta_2X+\beta_3X^2+\beta_4D$

where $D$ is a dummy variable that takes value 0 and 1.

If I want to examine if the effect of $X$ on $Y$ for $D=0$ and $D=1$ is the same, how would I include the interaction term in the equation. Do I just interact $X$ with $D$ or both $X$ and $X^2$ with $D$. I assumed that interact with both and I tried:

$Y=\beta_1+\beta_2X+\beta_3X^2+\beta_4D+\beta_5(X\times D)+\beta_6(X^2 \times D)$

Is this correct. If not could someone please explain to me what I should do? Thank you for your help.

Best Answer

In general, you don't want to throw away information. The way you set up the problem, you are evaluating interactions of D with both the linear and the quadratic terms involving X. If you have reason to believe that interaction with the quadratic term is important and you aren't at risk of overfitting the data with the extra interaction term, it's probably safest to keep it the way you wrote it.

Nevertheless, it's also acceptable to limit such interactions to the linear term. That's particularly the case with more complicated models having multiple polynomial terms. This is such a frequent occurrence in practice that Frank Harrell's rms package includes an operator %ia% that can be used to limit interactions to linear terms.

One suggestion: a fixed polynomial of a continuous predictor isn't always the best choice in regression modeling. Unless there's a theoretical basis for expecting a quadratic polynomial in your case, a flexible regression spline (e.g., the restricted cubic splines implemented by the rcs() function in the rms package) might be a better choice.

Related Question