Solved – Prediction on Interaction Terms in Multiple Linear Model

interactionmultiple regressionpredictive-models

I have created a MLR model where my predictor variables are continuous and categorical. I am interested in the interactions between the categorical variables.

Let's say I have the response variable $y$, and three predictor variables $x_1$, which is continuous and $x_2$ and $x_3$, which are binary, 1 if it is in and 0 if it is not.

Before I create the interaction terms I do mean subtraction to avoid linear dependency. So
$$\hat{x_2} = x_2-\text{mean}(x_2)$$
$$\hat{x_3} = x_3 – \text{mean}(x_3)$$

So my linear model is now:
$$y = ax_1 + bx_2 + cx_3 +d(\hat{x_2}*\hat{x_3}).$$

My question is about creating a test set of data with the same variables. When creating the test set do I make the interaction term as:
$$ x_2 * x_3 $$
without the mean subtraction?

Or do I also use mean subtraction when creating the test data iteraction term as:
$$ \hat{x_2}*\hat{x_3} $$

Best Answer

Even in the example you cite in your comment, which involves continuous rather than categorical variables, it's not clear that the centering gains anything. In R, for your case you simply would write something like lm(y~ x1 + x2*x3) to get both main effects and the interaction for x2 and x3. The "*" here isn't an arithmetic multiplication, but rather an instruction to determine both main and interaction effects. In the default treatment contrasts used by R, the main effect for x2 will be the influence on y of changing x2 from 0 to 1 when the value of x3 is 0, and the interaction term will be the additional influence of x2 (positive or negative) when x3 is 1 instead.