Solved – Interpretation of residuals vs fitted plot

assumptionsdata visualizationresiduals

I am checking that I have met the assumptions for multiple regression using the built in diagnostics within R. I think that from my online research, the DV violates the assumption of homoscedasticity (please see the residuals vs fitted plot below).

I tried log transforming the DV (log10) but this didn't seem to improve the residuals vs fitted plot. There are 2 dummy coded variables within my model and 1 continuous variable. The model only explains 23% of the variance in selection (DV) therefore, could the lack of homoscedasticity be because variable/s are missing? Any advice on where to go from here would be greatly appreciated.

Best Answer

It's difficult to judge the structure of the error terms just by looking at residuals. Here's a plot similar to yours, but generated from simulated data where we know the errors are homoskedastic. Does it look "bad"?

library(mixtools)

set.seed(235711)
n <- 300
df <- data.frame(epsilon=sqrt(40) * rt(n, df=5))
df$x <- rnormmix(n, lambda=c(0.02, 0.30, 0.03, 0.60, 0.05),
                 mu=c(8, 16, 30, 36, 52), sigma=c(2, 3, 2, 3, 6))
df$y <- 2 + df$x + df$epsilon
model <- lm(y ~ x, data=df)
plot(model)
plot(df$y ~ fitted(model))
plot(residuals(model) ~ fitted(model))

Best Answer

Related Solutions

Solved – Heteroscedasticity in residuals vs. fitted plot

Solved – What to do if residual plot looks good but qq-plot doesn’t, after transforming the predictor and response variables

Related Question