Solved – Interaction term in multivariate polynomial regression

multiple regression

I'm looking for answer for the question about multivariate polynomial regression.
I can't find a clear explanation of when an interaction term is necessary.

Some sources say that the estimated model of a complete second degree polynomial regression model in two variables $x_{1}$, $x_{2}$ may be expressed as

$$\hat{y} = b_{0} + b_{1}x_{1} + b_{2}x_{2} + b_{3}x_{1}^{2} + b_{4}x_{2}^{2} + + b_{5}x_{1}x_{2}$$

and others don't consider interaction term $x_{1}x_{2}$…

And how does it look when I have for example $d=4$ and $p=3$, a third degree polynomial of four independent variables?

Best Answer

Following the guidelines of Blackwell ("all models are wrong, but some are useful "), you have no obligation of including an interaction term, since:

you may not afford including an interaction term due to a small number of observations in your dataset;
you may have too much noise in your phenomenon so that including interaction terms can be futile, since you seldom will get them statistically significant; and
higher degree interaction terms (like $x_1^3 x_5^2 x_9^5$, for instance) will very, very, very rarely have any meaningful interpretation in terms of your real phenomenon (and it only gets worse if your $y$ is multivariate).

OK, in strict mathematical speak, you can (perhaps ought to) include them, but considering strictly modeling issues, most of time it will not make sense for your first approach to a phenomenon, albeit it can and will make more sense in the course of a series of improvements of an initial model, like, for instance, in the regression of the concentration of one product in function of the concentration of several reagents and catalysts in a very complex chemical reaction.

But please notice that all of it will make sense only if your experiments are extremely well controlled and if you have a horribly awful lot of observations, both conditions more compatible with a rather long series of experiments in a well consolidated research, not in a very first attempt to get to understand, say, a social phenomenon.

And, last but not least, as asked, $$ y = \sum_{i_1=0}^{3} \sum_{i_2=0}^{3-i_1} \sum_{i_3=0}^{3-i_1-i_2} \sum_{i_4=0}^{3-i_1-i_2-i_3} \beta_{i_1i_2i_3i_4} x_1^{i_1}x_2^{i_2}x_3^{i_3}x_4^{i_4} +\varepsilon, $$ which, according to my calculations, will be a hell of a long expression. =)

Something like $$ y = \beta_{0000} + \beta_{1000} x_1+ \beta_{0100} x_2+ \beta_{0010} x_3+ \beta_{0001} x_4+ \beta_{2000} x_1^2+ \beta_{0200} x_2^2+ \beta_{0020} x_3^2+ \beta_{0002} x_4^2+ \beta_{1100} x_1x_2+ \beta_{1010} x_1x_3+ \beta_{1001} x_1x_4+ \beta_{0110} x_2x_3+ \beta_{0101} x_2x_4+ \beta_{0011} x_3x_4+ \beta_{3000} x_1^3+ \beta_{0300} x_2^3+ \beta_{0030} x_3^3+ \beta_{0003} x_4^3+ \beta_{1110} x_1x_2x_3+ \beta_{1101} x_1x_2x_4+ \beta_{0111} x_2x_3x_4+ \ldots+ \varepsilon. $$

Edit: I took the time to write a lazy script in R to generate an expression in LaTeX with all the terms of the sum above, which output this: $$ y = \beta_{0000} + \beta_{0001} x_4 + \beta_{0002} x_4^2 + \beta_{0003} x_4^3 + \beta_{0010} x_3 + \beta_{0011} x_3 x_4 + \beta_{0012} x_3 x_4^2 + \beta_{0020} x_3^2 + \beta_{0021} x_3^2 x_4 + \beta_{0030} x_3^3 + \beta_{0100} x_2 + \beta_{0101} x_2 x_4 + \beta_{0102} x_2 x_4^2 + \beta_{0110} x_2 x_3 + \beta_{0111} x_2 x_3 x_4 + \beta_{0120} x_2 x_3^2 + \beta_{0200} x_2^2 + \beta_{0201} x_2^2 x_4 + \beta_{0210} x_2^2 x_3 + \beta_{0300} x_2^3 + \beta_{1000} x_1 + \beta_{1001} x_1 x_4 + \beta_{1002} x_1 x_4^2 + \beta_{1010} x_1 x_3 + \beta_{1011} x_1 x_3 x_4 + \beta_{1020} x_1 x_3^2 + \beta_{1100} x_1 x_2 + \beta_{1101} x_1 x_2 x_4 + \beta_{1110} x_1 x_2 x_3 + \beta_{1200} x_1 x_2^2 + \beta_{2000} x_1^2 + \beta_{2001} x_1^2 x_4 + \beta_{2010} x_1^2 x_3 + \beta_{2100} x_1^2 x_2 + \beta_{3000} x_1^3 + \varepsilon $$

The script itself is

d <- 3
formula <- "$$y = "  
`%$%` <- paste0
for (i1 in 0:d) {  
    for (i2 in 0:(d-i1)) {  
        for (i3 in 0:(d-i1-i2)) {  
            for (i4 in 0:(d-i1-i2-i3)) {  
                formula <-   
                    formula %$%   
                    "\\beta_{" %$%   
                    i1 %$% i2 %$% i3 %$% i4 %$%   
                    "}" %$%   
                    (if (i1>0) (" x_1" %$% if (i1>1) ("^" %$% i1))) %$%   
                    (if (i2>0) (" x_2" %$% if (i2>1) ("^" %$% i2))) %$%   
                    (if (i3>0) (" x_3" %$% if (i3>1) ("^" %$% i3))) %$%   
                    (if (i4>0) (" x_4" %$% if (i4>1) ("^" %$% i4))) %$%  
                    " + "  
            }  
        }  
    }  
}  
formula <- formula %$% "\\varepsilon$$"  
cat(formula)

At last, notice that it generated no fourth degree terms, like $\beta_{1111} x_1 x_2 x_3 x_4$, since you required a third degree polynomial ($d=3$) with four variables ($p=4$).

Related Solutions

Solved – Interaction terms and effect sizes in multiple regression

If you include an interaction term, then "the" effect no longer exists. Instead you have multiple effects: one for each level of the other variable with which you created the interaction. This is the very point of including interactions, so there is no way around it.

Solved – In linear regression, why should we include quadratic terms when we are only interested in interaction terms

It depends on the goal of inference. If you want to make inference of whether there exists an interaction, for instance, in a causal context (or, more generally, if you want to interpret the interaction coefficient), this recommendation from your professor does make sense, and it comes from the fact that misspecification of the functional form can lead to wrong inferences about interaction.

Here is a simple example where there is no interaction term between $x_1$ and $x_2$ in the structural equation of $y$, yet, if you do not include the quadratic term of $x_1$, you would wrongly conclude that $x_1$ interacts with $x_2$ when in fact it doesn't.

    set.seed(10)
    n <- 1e3
    x1 <- rnorm(n)
    x2 <- x1 + rnorm(n)
    y <- x1 + x2 + x1^2 + rnorm(n)
    summary(lm(y ~ x1 + x2 + x1:x2))
    
    Call:
    lm(formula = y ~ x1 + x2 + x1:x2)
    
    Residuals:
        Min      1Q  Median      3Q     Max 
    -3.7781 -0.8326 -0.0806  0.7598  7.7929 
    
    Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
    (Intercept)  0.30116    0.04813   6.257 5.81e-10 ***
    x1           1.03142    0.05888  17.519  < 2e-16 ***
    x2           1.01806    0.03971  25.638  < 2e-16 ***
    x1:x2        0.63939    0.02390  26.757  < 2e-16 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    Residual standard error: 1.308 on 996 degrees of freedom
    Multiple R-squared:  0.7935,    Adjusted R-squared:  0.7929 
    F-statistic:  1276 on 3 and 996 DF,  p-value: < 2.2e-16

This can be interpreted as simply a case of omitted variable bias, and here $x_1^2$ is the omitted variable. If you go back and include the squared term in your regression, the apparent interaction disappears.

    summary(lm(y ~ x1 + x2 + x1:x2 + I(x1^2)))   
     
    Call:
    lm(formula = y ~ x1 + x2 + x1:x2 + I(x1^2))
    
    Residuals:
        Min      1Q  Median      3Q     Max 
    -3.4574 -0.7073  0.0228  0.6723  3.7135 
    
    Coefficients:
                  Estimate Std. Error t value Pr(>|t|)    
    (Intercept) -0.0419958  0.0398423  -1.054    0.292    
    x1           1.0296642  0.0458586  22.453   <2e-16 ***
    x2           1.0017625  0.0309367  32.381   <2e-16 ***
    I(x1^2)      1.0196002  0.0400940  25.430   <2e-16 ***
    x1:x2       -0.0006889  0.0313045  -0.022    0.982    
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    Residual standard error: 1.019 on 995 degrees of freedom
    Multiple R-squared:  0.8748,    Adjusted R-squared:  0.8743 
    F-statistic:  1739 on 4 and 995 DF,  p-value: < 2.2e-16

Of course, this reasoning applies not only to quadratic terms, but misspecification of the functional form in general. The goal here is to model the conditional expectation function appropriately to assess interaction. If you are limiting yourself to modeling with linear regression, then you will need to include these nonlinear terms manually. But an alternative is to use more flexible regression modeling, such as kernel ridge regression for instance.

Best Answer

Related Solutions

Solved – Interaction terms and effect sizes in multiple regression

Solved – In linear regression, why should we include quadratic terms when we are only interested in interaction terms

Related Question