If you include an interaction term, then "the" effect no longer exists. Instead you have multiple effects: one for each level of the other variable with which you created the interaction. This is the very point of including interactions, so there is no way around it.
It depends on the goal of inference. If you want to make inference of whether there exists an interaction, for instance, in a causal context (or, more generally, if you want to interpret the interaction coefficient), this recommendation from your professor does make sense, and it comes from the fact that misspecification of the functional form can lead to wrong inferences about interaction.
Here is a simple example where there is no interaction term between $x_1$ and $x_2$ in the structural equation of $y$, yet, if you do not include the quadratic term of $x_1$, you would wrongly conclude that $x_1$ interacts with $x_2$ when in fact it doesn't.
set.seed(10)
n <- 1e3
x1 <- rnorm(n)
x2 <- x1 + rnorm(n)
y <- x1 + x2 + x1^2 + rnorm(n)
summary(lm(y ~ x1 + x2 + x1:x2))
Call:
lm(formula = y ~ x1 + x2 + x1:x2)
Residuals:
Min 1Q Median 3Q Max
-3.7781 -0.8326 -0.0806 0.7598 7.7929
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.30116 0.04813 6.257 5.81e-10 ***
x1 1.03142 0.05888 17.519 < 2e-16 ***
x2 1.01806 0.03971 25.638 < 2e-16 ***
x1:x2 0.63939 0.02390 26.757 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.308 on 996 degrees of freedom
Multiple R-squared: 0.7935, Adjusted R-squared: 0.7929
F-statistic: 1276 on 3 and 996 DF, p-value: < 2.2e-16
This can be interpreted as simply a case of omitted variable bias, and here $x_1^2$ is the omitted variable. If you go back and include the squared term in your regression, the apparent interaction disappears.
summary(lm(y ~ x1 + x2 + x1:x2 + I(x1^2)))
Call:
lm(formula = y ~ x1 + x2 + x1:x2 + I(x1^2))
Residuals:
Min 1Q Median 3Q Max
-3.4574 -0.7073 0.0228 0.6723 3.7135
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.0419958 0.0398423 -1.054 0.292
x1 1.0296642 0.0458586 22.453 <2e-16 ***
x2 1.0017625 0.0309367 32.381 <2e-16 ***
I(x1^2) 1.0196002 0.0400940 25.430 <2e-16 ***
x1:x2 -0.0006889 0.0313045 -0.022 0.982
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 1.019 on 995 degrees of freedom
Multiple R-squared: 0.8748, Adjusted R-squared: 0.8743
F-statistic: 1739 on 4 and 995 DF, p-value: < 2.2e-16
Of course, this reasoning applies not only to quadratic terms, but misspecification of the functional form in general. The goal here is to model the conditional expectation function appropriately to assess interaction. If you are limiting yourself to modeling with linear regression, then you will need to include these nonlinear terms manually. But an alternative is to use more flexible regression modeling, such as kernel ridge regression for instance.
Best Answer
Following the guidelines of Blackwell ("all models are wrong, but some are useful "), you have no obligation of including an interaction term, since:
OK, in strict mathematical speak, you can (perhaps ought to) include them, but considering strictly modeling issues, most of time it will not make sense for your first approach to a phenomenon, albeit it can and will make more sense in the course of a series of improvements of an initial model, like, for instance, in the regression of the concentration of one product in function of the concentration of several reagents and catalysts in a very complex chemical reaction.
But please notice that all of it will make sense only if your experiments are extremely well controlled and if you have a horribly awful lot of observations, both conditions more compatible with a rather long series of experiments in a well consolidated research, not in a very first attempt to get to understand, say, a social phenomenon.
And, last but not least, as asked, $$ y = \sum_{i_1=0}^{3} \sum_{i_2=0}^{3-i_1} \sum_{i_3=0}^{3-i_1-i_2} \sum_{i_4=0}^{3-i_1-i_2-i_3} \beta_{i_1i_2i_3i_4} x_1^{i_1}x_2^{i_2}x_3^{i_3}x_4^{i_4} +\varepsilon, $$ which, according to my calculations, will be a hell of a long expression. =)
Something like $$ y = \beta_{0000} + \beta_{1000} x_1+ \beta_{0100} x_2+ \beta_{0010} x_3+ \beta_{0001} x_4+ \beta_{2000} x_1^2+ \beta_{0200} x_2^2+ \beta_{0020} x_3^2+ \beta_{0002} x_4^2+ \beta_{1100} x_1x_2+ \beta_{1010} x_1x_3+ \beta_{1001} x_1x_4+ \beta_{0110} x_2x_3+ \beta_{0101} x_2x_4+ \beta_{0011} x_3x_4+ \beta_{3000} x_1^3+ \beta_{0300} x_2^3+ \beta_{0030} x_3^3+ \beta_{0003} x_4^3+ \beta_{1110} x_1x_2x_3+ \beta_{1101} x_1x_2x_4+ \beta_{0111} x_2x_3x_4+ \ldots+ \varepsilon. $$
Edit: I took the time to write a lazy script in
R
to generate an expression inLaTeX
with all the terms of the sum above, which output this: $$ y = \beta_{0000} + \beta_{0001} x_4 + \beta_{0002} x_4^2 + \beta_{0003} x_4^3 + \beta_{0010} x_3 + \beta_{0011} x_3 x_4 + \beta_{0012} x_3 x_4^2 + \beta_{0020} x_3^2 + \beta_{0021} x_3^2 x_4 + \beta_{0030} x_3^3 + \beta_{0100} x_2 + \beta_{0101} x_2 x_4 + \beta_{0102} x_2 x_4^2 + \beta_{0110} x_2 x_3 + \beta_{0111} x_2 x_3 x_4 + \beta_{0120} x_2 x_3^2 + \beta_{0200} x_2^2 + \beta_{0201} x_2^2 x_4 + \beta_{0210} x_2^2 x_3 + \beta_{0300} x_2^3 + \beta_{1000} x_1 + \beta_{1001} x_1 x_4 + \beta_{1002} x_1 x_4^2 + \beta_{1010} x_1 x_3 + \beta_{1011} x_1 x_3 x_4 + \beta_{1020} x_1 x_3^2 + \beta_{1100} x_1 x_2 + \beta_{1101} x_1 x_2 x_4 + \beta_{1110} x_1 x_2 x_3 + \beta_{1200} x_1 x_2^2 + \beta_{2000} x_1^2 + \beta_{2001} x_1^2 x_4 + \beta_{2010} x_1^2 x_3 + \beta_{2100} x_1^2 x_2 + \beta_{3000} x_1^3 + \varepsilon $$The script itself is
At last, notice that it generated no fourth degree terms, like $\beta_{1111} x_1 x_2 x_3 x_4$, since you required a third degree polynomial ($d=3$) with four variables ($p=4$).