Solved – Parallel straight Lines on Residuals vs. Predicted Values Plot

heteroscedasticitymultiple regressionresiduals

In a data set I have, all of the results of the residual vs. predicted values appear as parallel lines as pictured below. I interpret this according to my training as a violation of homoscedasticity which makes running a multiple regression untenable.

enter image description here

Here is some context about the data sets, type of analysis used, the design of the study, etc.

  1. The type of analysis I conducted was multiple regression. In SPSS – Analyze > Regression > Linear.

  2. The independent (predictor) and dependent (outcome) variables are all ordinal data (though I'm treating them as "scale" in the analysis). Both the independent and dependent variables represent satisfaction scores on a 7-point scale. These satisfaction scores were collected from the same participants at the same point in time (one survey). There are 4 independent variables that represent participants' satisfaction with several more granular dimensions or aspects of a service (e.g., easy to use, effective in meeting needs, etc.) The dependent variable is satisfaction overall with the service.

  3. More about the other results from testing the multiple regression assumptions: (1) The data set passed the "independence of observations" test as measured by a Durbin-Watson statistics which was approximately 2; (2) the independent and dependent variables are approximately linearly related which was based on an inspection of the partial plots produced by the regression analysis.

  4. There are around 20 highly influential data points out of approximately 450 participants as measured by studentized deleted residuals of over +/- 3.

I've tried transforming the data using a log transformation and this did not at all help the apparent heteroscedasticity problem. Plotting the log transformed residuals vs. predicted values yielded the same general pattern in the chart.

The other technique I tried was outlined in this paper http://afhayes.com/public/BRM2007.pdf which described providing a regression model with heteroscedasticity-corrected error residuals. The beta coefficients for each predictor variable were essentially the same as the non-heteroscedasticity corrected model, as well as the overall R^2 value for the model.

Some of the predictor variables are highly correlated with one another, but around .6 – .8 – so there may be a slight multicollinearity problem – however, the tolerance and VIF values are fine. More intuitively, a lot of the participants provided similar satisfaction ratings for all of the predictor variables and the dependent variable. Quite a few on the 7-point scale answered things like " 7 7 7" for the predictors and "7" for the outcome or "1 1 1" and "1". However, this was not always the case.

Looking for some guidance around potential reasons for this, alternative ways to analyze the model, etc. Thanks in advance.

Best Answer

The parallel lines are a logical consequence of the fact that your dependent variable has only a few possible values. Try and plot your dependent variable against your one of your independent variables and overlay a regression line. You will see a couple of horizontal line, and a sloping regression line. Now look at a particular value of your independent variable and see what the residuals (deviation from the regression line) is. Look at a value of your independent variable to the right and see what happens to those residuals.

Transforming your variable cannot change the fact that you have only a few possible values, so no transformation can change this pattern.

Related Question