Regression – Interactions Between Categorical and Continuous Variables

categorical datacontinuous datainteractionregression

I have a dependent variable that is continuous
and
I have two independent variables: one continuous and one categorical (with 2 categories)

The interaction between the independent variables is significant.
Which statistical analysis should I use (in R) to proceed with the analysis and document the interaction?

(Should I simply analyze each of the two categories separately using simple linear regression?)

Best Answer

In the scenario you describe least squares regression will allow you to tell a very straightforward story:

First of all, imagine that you have no dichotomous independent variable. So:

(1) $y_{i} = \beta_{0} + \beta_{1}x_{1i} + \varepsilon_{i}$

Your regression describes the relationship between your dependent variable $y$ and your continuous independent variable $x_{1}$ as a straight line, with intercept $\beta_{0}$ and slope $\beta_{1}$. Cool? Cool.

Now add both the dichotomous independent variable $x_{2}$ and the interaction between $x_{1}$ and $x_{2}$ to the model:

(2) $y_{i} = \beta_{0} + \beta_{1}x_{1i} + \beta_{2}x_{2i} + \beta_{3}x_{1i}x_{2i} + \varepsilon_{i}$

So now what is your model telling you? Well, (assuming $x_{2}$ is coded 0/1) when $x_{2} = 0$, then the model reduces to equation (1) because $\beta_{2} \times 0 = 0$ and $\beta_{3} \times x_{1} \times 0 = 0$. So that is easy-peasy puddin' pie.

What about when $x_{2} =1$? Well now the $y$-intercept is $\beta_{0} + \beta_{2}$ (Right? Because $\beta_{2} \times 1 = \beta_{2}$).

And the slope of the line relating $y$ to $x_{1}$ is now $\beta_{1} + \beta_{3}$ (Right? Because $\beta_{1}\times x_{1} + \beta_{3} \times x_{1} \times 1 = \beta_{1}\times x_{1} + \beta_{3} \times x_{1} = (\beta_{1} + \beta_{3})\times x_{1}$).

So when $x_{2}=1$ you simply have a second regression line relating $y$ to $x_{1}$, with a different intercept (if $\beta_{2} \ne 0$) and a different slope (if $\beta_{3} \ne 0$ which will be true if you tested a significant interaction term in, say, ANOVA).

How do you communicate this? A single graph with two regression lines overlaying your data (possibly with different colored/shaped/sized markers when $x_{2}=1$), and a label indicating which line corresponds to $x_{2}=0$ and $x_{2}=1$. Also providing your audience with the values of the $\beta$s and their standard errors and/or confidence intervals is good (like, in a table of multiple regression results).

Cool? Cool.

Finally, while all the above tells you about trend relationships between $y$ and $x_{1}$ given $x_{2}$, least squares regression also tells you about strength of association. If you had a single independent variable, you'd probably want to use something like $R^{2}$ to describe this strength of association, but when you add variables $R^{2}$ doesn't quite mean what it did before. So you might use generalized $R^{2}$, or Pseudo-$R^{2}$ or some such.

Related Solutions

Solved – Continuous and categorical variables in SPSS GLM

ANOVA and multiple regression are two “flavors” of the general linear model (historically, they developed separately before being integrated in the same general framework and are still taught differently depending on the discipline but this is really all the same). Specifically, the analysis you carried out could also be described as an ANCOVA. Because of the links between multiple regression, ANOVA and GLM, it can be performed in SPSS using three different procedures:

REGRESSION
   /DEPENDENT ResponseTime
   /METHOD ENTER Factor Covariate.

UNIANOVA ResponseTime BY Factor WITH Covariate.

GLM ResponseTime BY Factor WITH Covariate.

The difference lies in the way the information is presented in the graphical user interface, some default settings and other options of each procedure and some details of the output and its interpretation but all three basically fit similar models and present equivalent tests for the main effects of each variable. Also, using one for the other does not free you from any assumption.

That said, there are a few puzzling things in your question. First, I only see a single outcome (response time) and therefore do not understand your instructor's advice to use a multivariate model (are you sure she didn't say “multiple regression”?).

Second, the binary nature of the outcome is surprising, response times are usually more or less continuous. Did you dichotomize this variable? Also, if it is really dichotomous, then none of this (GLM, ANOVA, ordinary regression) might in fact be the best way to analyze these data. Instead, you should probably look at generalized (not general) linear models or logistic regression.

I just realized that your main question seems to be about the interaction between covariate and factor (sorry for not focusing on that first). In SPSS, such a model can indeed be fitted with the GLM procedure. You will find more on this in ANCOVA and its disturbing assumptions and How to specify ANCOVA interactions in SPSS? (in particular see this link from @JeromyAnglim's answer).

In your case the syntax would probably be something like

GLM ResponseTime BY Factor WITH Covariate
   /DESIGN Factor Covariate Factor*Covariate.

You can also fit the same model with UNIANOVA or through the graphical interface (by clicking the “model” button in the “General linear model/Univariate” dialog box and defining a custom model). The resulting syntax would be:

UNIANOVA ResponseTime BY Factor WITH Covariate
   /DESIGN Factor Covariate Covariate*Factor.

(In both cases, the key element is the Covariate*Factor part of the /DESIGN statement, this adds an interaction to the model.)

Solved – dumthe variables, interaction with continuous variable, and variable selection

If by dummy variables you're referring to multiple binary variables that make up one categorical predictor, each of them needs to be in the model for each other dummy to be meaningful. In stepwise regression either they are all in or all out, but not piecemeal. Are you doing this by hand or something? All stats packages I'm familiar with treat multilevel categoricals properly in this respect, and shouldn't consider dummy variables independently for model specification.
Again, you can't include interactions with some dummy variables of a single categorical predictor but not others. All in or all out. The test of whether the interaction needs to be included is a comparison between a model without interactions with all dummies and a model with interactions with all dummies. If the interaction is significant, you should keep it in any case. Just be aware that the interpretation of the "main effects" changes drastically when interactions are included in models.
If doing backwards stepwise regression, include the interaction terms.

Best Answer

Related Solutions

Solved – Continuous and categorical variables in SPSS GLM

Solved – dumthe variables, interaction with continuous variable, and variable selection

Related Question