Regression – Writing Multiple Linear Regression Equation with Interaction Term

interactionmachine learningregression

I was going through an article on Towards Data Science page on interaction Terms. The author used two predictors: Investment1 and Investment2, he then wrote the multiple linear equation with the interaction between Investment1 and Investment2 as below:

$$
ROI = \hat{β}{_0} + \hat{β}{_1}\text{Investment1} + \hat{β}{_2}\text{Investment2} + \hat{β}{_3}\text{Investment1*Investment2} + Ɛ
$$

next he reduced the equation as:

$$
ROI = \hat{β}{_0} + (\hat{β}{_1} + \hat{β}{_3}*\text{Investment2})*\text{Investment1} + \hat{β}{_2}\text{Investment2} + Ɛ
$$

But this equation could also be reduced as below, instead of what the author has done….
$$
ROI = \hat{β}{_0} + \hat{β}{_1}\text{Investment1} + (\hat{β}{_2} + \hat{β}{_3}*\text{Investment1)*Investment2}
$$

I am confused what difference is there in both of them? If the author's way is right then why is it right? if there is any interpretational difference, then could someone please explain both in layman terms…

My second question regarding Interaction Term:
Now, lets suppose instead of two predictors there are few more predictors and they are interacting too….

example:
Lets say we have predictors: GPA, IQ, Gender and the ones that have interaction are GPA * IQ and GPA * Gender

Now, the multiple linear regression equation with Interaction Terms included, would be:
$$
\hat{Y}{_i} = \hat{β}{_0} + \hat{β}{_1}\text{GPA} + \hat{β}{_2}\text{IQ} + \hat{β}{_3}\text{Gender} + \hat{β}{_4}\text{GPA*IQ} + \hat{β}{_5}\text{GPA*Gender}
$$

The part now I am confused here is, how to write this equation in reduced form as above…. which term should I take along with the interaction term and re-write the equation….!!!

Best Answer

There's really no need to use any of the "reduced forms"; they are just different ways of combining the coefficients and predictors. All are correct. The "reduced forms" might help make it clearer that the association of Investment1 on ROI depends on the level of Investment2 (the author's "reduced form") and that the association of Investment2 on ROI also depends on the level of Investment1 (your "reduced form").

The usual model matrix for regression works with the original forms, with a column of 1s for the intercept, a column for each predictor's values individually, and a column for a product of predictor values for each interaction in the model. After all, an interaction is just a product of individual predictors.

So I'd suggest not to worry about writing any "reduced form" unless it helps your understanding in some way.