Suppose I am interested in a linear regression model, for $$Y_i = \beta_0 + \beta_1x_1 + \beta_2x_2 + \beta_3x_1x_2$$, because I would like to see if an interaction between the two covariates have an effect on Y.
In a professors' course notes (whom I do not have contact with), it states:
When including interaction terms, you should include their second degree terms. ie $$Y_i = \beta_0 + \beta_1x_1 + \beta_2x_2 + \beta_3x_1x_2 +\beta_4x_1^2 + \beta_5x_2^2$$ should be included in the regression.
Why should one include second degree terms when we are only interested in the interactions?
Best Answer
It depends on the goal of inference. If you want to make inference of whether there exists an interaction, for instance, in a causal context (or, more generally, if you want to interpret the interaction coefficient), this recommendation from your professor does make sense, and it comes from the fact that misspecification of the functional form can lead to wrong inferences about interaction.
Here is a simple example where there is no interaction term between $x_1$ and $x_2$ in the structural equation of $y$, yet, if you do not include the quadratic term of $x_1$, you would wrongly conclude that $x_1$ interacts with $x_2$ when in fact it doesn't.
This can be interpreted as simply a case of omitted variable bias, and here $x_1^2$ is the omitted variable. If you go back and include the squared term in your regression, the apparent interaction disappears.
Of course, this reasoning applies not only to quadratic terms, but misspecification of the functional form in general. The goal here is to model the conditional expectation function appropriately to assess interaction. If you are limiting yourself to modeling with linear regression, then you will need to include these nonlinear terms manually. But an alternative is to use more flexible regression modeling, such as kernel ridge regression for instance.