Regression – Interactions Between Categorical and Continuous Variables

categorical datacontinuous datainteractionregression

I have a dependent variable that is continuous
and
I have two independent variables: one continuous and one categorical (with 2 categories)

The interaction between the independent variables is significant.
Which statistical analysis should I use (in R) to proceed with the analysis and document the interaction?

(Should I simply analyze each of the two categories separately using simple linear regression?)

Best Answer

In the scenario you describe least squares regression will allow you to tell a very straightforward story:

First of all, imagine that you have no dichotomous independent variable. So:

(1) $y_{i} = \beta_{0} + \beta_{1}x_{1i} + \varepsilon_{i}$

Your regression describes the relationship between your dependent variable $y$ and your continuous independent variable $x_{1}$ as a straight line, with intercept $\beta_{0}$ and slope $\beta_{1}$. Cool? Cool.

Now add both the dichotomous independent variable $x_{2}$ and the interaction between $x_{1}$ and $x_{2}$ to the model:

(2) $y_{i} = \beta_{0} + \beta_{1}x_{1i} + \beta_{2}x_{2i} + \beta_{3}x_{1i}x_{2i} + \varepsilon_{i}$

So now what is your model telling you? Well, (assuming $x_{2}$ is coded 0/1) when $x_{2} = 0$, then the model reduces to equation (1) because $\beta_{2} \times 0 = 0$ and $\beta_{3} \times x_{1} \times 0 = 0$. So that is easy-peasy puddin' pie.

What about when $x_{2} =1$? Well now the $y$-intercept is $\beta_{0} + \beta_{2}$ (Right? Because $\beta_{2} \times 1 = \beta_{2}$).

And the slope of the line relating $y$ to $x_{1}$ is now $\beta_{1} + \beta_{3}$ (Right? Because $\beta_{1}\times x_{1} + \beta_{3} \times x_{1} \times 1 = \beta_{1}\times x_{1} + \beta_{3} \times x_{1} = (\beta_{1} + \beta_{3})\times x_{1}$).

So when $x_{2}=1$ you simply have a second regression line relating $y$ to $x_{1}$, with a different intercept (if $\beta_{2} \ne 0$) and a different slope (if $\beta_{3} \ne 0$ which will be true if you tested a significant interaction term in, say, ANOVA).

How do you communicate this? A single graph with two regression lines overlaying your data (possibly with different colored/shaped/sized markers when $x_{2}=1$), and a label indicating which line corresponds to $x_{2}=0$ and $x_{2}=1$. Also providing your audience with the values of the $\beta$s and their standard errors and/or confidence intervals is good (like, in a table of multiple regression results).

Cool? Cool.

Finally, while all the above tells you about trend relationships between $y$ and $x_{1}$ given $x_{2}$, least squares regression also tells you about strength of association. If you had a single independent variable, you'd probably want to use something like $R^{2}$ to describe this strength of association, but when you add variables $R^{2}$ doesn't quite mean what it did before. So you might use generalized $R^{2}$, or Pseudo-$R^{2}$ or some such.