Multiple Regression – How to Visualize a Fitted Multiple Regression Model?

data visualizationmultiple regressionregressionreporting

I am currently writing a paper with several multiple regression analyses. While visualizing univariate linear regression is easy via scatter plots, I was wondering whether there is any good way to visualize multiple linear regressions?

I am currently just plotting scatter plots like dependent variable vs. 1st independent variable, then vs. 2nd independent variable, etc. I would really appreciate any suggestions.

Best Answer

There is nothing wrong with your current strategy. If you have a multiple regression model with only two explanatory variables then you could try to make a 3D-ish plot that displays the predicted regression plane, but most software don't make this easy to do. Another possibility is to use a coplot (see also: coplot in R or this pdf), which can represent three or even four variables, but many people don't know how to read them. Essentially however, if you don't have any interactions, then the predicted marginal relationship between $x_j$ and $y$ will be the same as predicted conditional relationship (plus or minus some vertical shift) at any specific level of your other $x$ variables. Thus, you can simply set all other $x$ variables at their means and find the predicted line $\hat y = \hat\beta_0 + \cdots + \hat\beta_j x_j + \cdots + \hat\beta_p \bar x_p$ and plot that line on a scatterplot of $(x_j, y)$ pairs. Moreover, you will end up with $p$ such plots, although you might not include some of them if you think they are not important. (For example, it is common to have a multiple regression model with a single variable of interest and some control variables, and only present the first such plot).

On the other hand, if you do have interactions, then you should figure out which of the interacting variables you are most interested in and plot the predicted relationship between that variable and the response variable, but with several lines on the same plot. The other interacting variable is set to different levels for each of those lines. Typical values would be the mean and $\pm$ 1 SD of the interacting variable. To make this clearer, imagine you have only two variables, $x_1$ and $x_2$, and you have an interaction between them, and that $x_1$ is the focus of your study, then you might make a single plot with these three lines:
\begin{align} \hat y &= \hat\beta_0 + \hat\beta_1 x_1 + \hat\beta_2 (\bar x_2 - s_{x_2}) + \hat\beta_3 x_1(\bar x_2 - s_{x_2}) \\ \hat y &= \hat\beta_0 + \hat\beta_1 x_1 + \hat\beta_2 \bar x_2 \quad\quad\quad\ + \hat\beta_3 x_1\bar x_2 \\ \hat y &= \hat\beta_0 + \hat\beta_1 x_1 + \hat\beta_2 (\bar x_2 + s_{x_2}) + \hat\beta_3 x_1(\bar x_2 + s_{x_2}) \end{align}

An example plot that's similar (albeit with a binary moderator) can be seen in my answer to Plot regression with interaction in R.