Solved – How to visualise a three way interaction between two continuous variables and one categorical variable

data visualizationggplot2interactionrregression

I have a model in R that includes a significant three-way interaction between two continuous independent variables IVContinuousA, IVContinuousB, IVCategorical and one categorical variable (with two levels: Control and Treatment). The dependent variable is continuous (DV).

model<-lm(DV ~ IVContinuousA * IVContinuousB * IVCategorical)

You can find the data here

I am trying to find out a way to visualise this in R to ease my interpretation of it (perhaps in ggplot2?).

I thought that I could dichotomise IVContinuousB into high and low values (so it would be a two-level factor itself: IVContinuousBHigh = mean of IVContinuousB + sd of IVContinuousB; IVContinuousBLow = mean of IVContinuousB – sd of IVContinuousB).

I then planned to plot the relationship between DV and IV ContinuousA and fit lines representing the slopes of this relationship for different combinations of IVCategorical and my new dichotomised IVContinuousB:

IVCategoricalControl and IVContinuousBHigh
IVCategoricalControl and IVContinuousBLow
IVCategoricalTreatment and IVContinuousBHigh
IVCategoricalTreatment and IVContinuousBLow

My first question is – does this sound like a viable solution to producing an interpretable plot of this three-way-interaction? I want to avoid 3D plots if possible as I don't find them intuitive.

Secondly – does anyone have any idea on how to code for this in R via ggplot2? I appreciate this latter question may be more apt for Stack Overflow but given the problem I have is primarily a methodological one I thought I'd post it on Cross Validated as a primary choice – hope this is ok.

Thank you very much in advance for your thoughts.

Note that there are NAs (left as blanks) in the DV column and the design is unbalanced – with slightly different numbers of datapoints in the Control vs Treatment groups of the variable IVCategorical.

Best Answer

One can make a variety of charts that include all the variables, but it's another task to get meaningful information out, especially for a static view. Whatever you use, it's a good idea to try it on some simulated data that has interactions of various sizes so you can get a sense of how the interactions will show up in the views.

It's difficult to visualize the continuous interactions at the same time as the main effects because the latter are often more dominant. It can be useful to look at the interaction against the residuals after applying the main effects -- let's test it out.

With your data (thanks!), we can get a sense of the main effects by looking at one independent variable at a time. Further, we can color by the categorical input to see how that variable interacts with the others.

All the variables seem to make a difference (no zero slopes), but there are no obvious interactions with the categorical variable since the different-colored smoothers are roughly parallel. That is, the categorical variable moves the curve up or down, but doesn't obviously change its shape.

We can also see from these plots that the distribution of the dependent variable is skewed, which probably needs to be taken into account. With knowledge of the domain perhaps there this is a transformation that makes sense, or perhaps a more robust modeling technique is needed.

Skipping over that important detail for the sake of discussing interaction visualization, let's assume we accept what look like linear effects for these variables. Then can make a model and compute the residuals. Now we can look at the residuals against the products of the continuous variables (or their deltas) to get a sense of interaction.

Here is what it looks like for your data. Overlapping curves with near-zero slopes suggests no interactions.

If I modify the data to add an interaction between the two continuous variables the result is overlapping and close-to-parallel lines with non-zero slopes. That is, the interaction of these two variables has an effect, but it's not very different for each of the categorical variables.

If I add a 3-way interaction into the data, we see non-parallel and non-zero slopes.

For the sake of comparison, here is another way of visualization the interaction: two paneled, smoothed contour plots of the residuals, one for the original data and one the first augmentation. There is a difference--the interaction has a saddle shape in 2-space, but it's harder to distinguish for random variation. The diverging color scale encode the positive and negative residuals.

Original data:

Augumented with an interaction:

In all the plots you have to keep in mind that the fit is less trustworthy where the data is sparse, and I think it's harder to do that with the contour plots.

These techniques are independent of the statistical tool, which is mostly what this forum is about. I used JMP for these examples, but I'm sure the techniques can be done with R.

Related Solutions

Solved – three-way-interaction between two continuous and one binary variable using OLS stata

Suppose you have two continuous regressors (weight and miles per gallon) and one binary regressor (foreign manufacturer). The outcome is car price and you are interested in the effect of weight on price. You can get a sense of the interactions like this:

sysuse auto
reg price c.mpg##c.weight##i.foreign
margins, dydx(weight) at(foreign = (0 1) mpg =(10(10)40) weight = (2000(1000)3000))
marginsplot, bydimension(foreign weight)

Note that you can calculate interactions on the fly using factor variable notation. This ensures that Stata understands how all the variables are related in calculating derivative.

The expected price is

$$E[p \vert w,m,f]=\beta_0 + \beta_1 w + \beta_2 m + \beta_3 f +\beta_4 w\cdot m+\beta_5 w \cdot f +\beta_6m\cdot f + \beta_7 w\cdot m \cdot f$$

The derivative with respect to weight (aka the conditional marginal effect of weight) is

$$\frac{\partial E[p \vert w,m,f]}{\partial w}= \beta_1 + \beta_4 m+\beta_5 \cdot f + \beta_7 m \cdot f,$$

which is a linear function of foreign and mpg.

The margins command calculates this derivative evaluated at various combinations of foreign and mpg that I have selected, and marginsplot produces a graph of the derivative of the expected price with respect to weight for light and heavy domestic and foreign cars at various values of mileage. You could also have other regressors in the model, in which case the marginal effect would average over their effects:

enter image description here

For instance, let's look at the first panel. For a very efficient light domestic car, an additional pound has negligible effect on the price. For an inefficient one, it adds $6.50 to the price tag.

If you are interested in formally testing that these derivatives are different at various values of mpg, you can calculate contrasts of margins like this:

margins r.foreign, dydx(weight) at(mpg = (10(10)40))
marginsplot

This tests the null that the derivative of expected price with respect to weight is the same for foreign and domestic casts at various values of mpg. It also gives you an overall test that all four differences are jointly zero. It probably makes sense to make some adjustment for multiple comparisons, though I have not done that here.

A really nice introduction to these commands is Michael N. Mitchell's Interpreting and Visualizing Regression Models Using Stata. Chapter 13 deals with continuous by continuous by categorical interactions.

Solved – How to interpret interaction between categorical and continuous variable

Don't be fooled by the fact that summary() gives you p-values. You can't say that "an interaction is significant." You need to calculate marginal effects. So, you can say something like, "this continuous variable was significant for some category" or "this category was significant over some range of this continuous variable."

Let's work a simple example. We have the model: $$ outcome = \alpha +b_1age+b_2gender + b_3age*gender + \epsilon $$

The marginal effect of $age$ (continuous) on $outcome$ (continuous) is the partial derivative of $outcome$ with respect to $age$:

$$ \frac{\partial outcome}{\partial age} = b_1 + b_3*gender $$

So, the effect of $age$ depends on the category of $gender$, which for a variable with only two categories, as in your example, is not too complicated (you're multiplying $b_3$ by either 0 or 1).

In R, try the 'effects' package if you have not already:

library(effects)
allEffects(mod)
plot(allEffects(mod))

Also -- with so many interactions, make sure that your model is not over-specified. I don't know how large your sample is, but I'd imagine that it would need to be quite large.

Best Answer

Related Solutions

Solved – three-way-interaction between two continuous and one binary variable using OLS stata

Solved – How to interpret interaction between categorical and continuous variable

Related Question