I need to make a residual plot and I was wondering whether I make the plots in multiple linear regression on one independent variable at a time (like making a simple linear regression) or the all of the ten independent variables at the same time (like multiple linear regression)? They produce different results for me obviously.
Solved – Making a residual plot in multiple linear regression
data visualizationmlrregressionresiduals
Related Solutions
A plot of residuals versus predicted response is essentially used to spot possible heteroskedasticity (non-constant variance across the range of the predicted values), as well as influential observations (possible outliers). Usually, we expect such plot to exhibit no particular pattern (a funnel-like plot would indicate that variance increase with mean). Plotting residuals against one predictor can be used to check the linearity assumption. Again, we do not expect any systematic structure in this plot, which would otherwise suggest some transformation (of the response variable or the predictor) or the addition of higher-order (e.g., quadratic) terms in the initial model.
More information can be found in any textbook on regression or on-line, e.g. Graphical Residual Analysis or Using Plots to Check Model Assumptions.
As for the case where you have to deal with multiple predictors, you can use partial residual plot, available in R in the car (crPlot
) or faraway (prplot
) package. However, if you are willing to spend some time reading on-line documentation, I highly recommend installing the rms package and its ecosystem of goodies for regression modeling.
There is nothing wrong with your current strategy. If you have a multiple regression model with only two explanatory variables then you could try to make a 3D-ish plot that displays the predicted regression plane, but most software don't make this easy to do. Another possibility is to use a coplot (see also: coplot in R or this pdf), which can represent three or even four variables, but many people don't know how to read them. Essentially however, if you don't have any interactions, then the predicted marginal relationship between $x_j$ and $y$ will be the same as predicted conditional relationship (plus or minus some vertical shift) at any specific level of your other $x$ variables. Thus, you can simply set all other $x$ variables at their means and find the predicted line $\hat y = \hat\beta_0 + \cdots + \hat\beta_j x_j + \cdots + \hat\beta_p \bar x_p$ and plot that line on a scatterplot of $(x_j, y)$ pairs. Moreover, you will end up with $p$ such plots, although you might not include some of them if you think they are not important. (For example, it is common to have a multiple regression model with a single variable of interest and some control variables, and only present the first such plot).
On the other hand, if you do have interactions, then you should figure out which of the interacting variables you are most interested in and plot the predicted relationship between that variable and the response variable, but with several lines on the same plot. The other interacting variable is set to different levels for each of those lines. Typical values would be the mean and $\pm$ 1 SD of the interacting variable. To make this clearer, imagine you have only two variables, $x_1$ and $x_2$, and you have an interaction between them, and that $x_1$ is the focus of your study, then you might make a single plot with these three lines:
\begin{align}
\hat y &= \hat\beta_0 + \hat\beta_1 x_1 + \hat\beta_2 (\bar x_2 - s_{x_2}) + \hat\beta_3 x_1(\bar x_2 - s_{x_2}) \\
\hat y &= \hat\beta_0 + \hat\beta_1 x_1 + \hat\beta_2 \bar x_2 \quad\quad\quad\ + \hat\beta_3 x_1\bar x_2 \\
\hat y &= \hat\beta_0 + \hat\beta_1 x_1 + \hat\beta_2 (\bar x_2 + s_{x_2}) + \hat\beta_3 x_1(\bar x_2 + s_{x_2})
\end{align}
An example plot that's similar (albeit with a binary moderator) can be seen in my answer to Plot regression with interaction in R.
Best Answer
To check for overall heteroscedasticity:
Note that John Fox in Regression Diagnostics finds that, typically, only when the variance of the residuals varies by a factor of three or more is it a serious problem for regression estimation.
To check for overall linearity:
Then you might create a linear fitline and one using a lowess and/or a quadratic or even a cubic fit, to compare to the linear one.
To check for heteroscedasticity, linearity, and influential points with respect to each X-Y relationship: