Solved – Interpretation of residuals vs fitted plot

assumptionsdata visualizationresiduals

I am checking that I have met the assumptions for multiple regression using the built in diagnostics within R. I think that from my online research, the DV violates the assumption of homoscedasticity (please see the residuals vs fitted plot below).
enter image description here

I tried log transforming the DV (log10) but this didn't seem to improve the residuals vs fitted plot. There are 2 dummy coded variables within my model and 1 continuous variable. The model only explains 23% of the variance in selection (DV) therefore, could the lack of homoscedasticity be because variable/s are missing? Any advice on where to go from here would be greatly appreciated.

Best Answer

It's difficult to judge the structure of the error terms just by looking at residuals. Here's a plot similar to yours, but generated from simulated data where we know the errors are homoskedastic. Does it look "bad"?

residuals

library(mixtools)

set.seed(235711)
n <- 300
df <- data.frame(epsilon=sqrt(40) * rt(n, df=5))
df$x <- rnormmix(n, lambda=c(0.02, 0.30, 0.03, 0.60, 0.05),
                 mu=c(8, 16, 30, 36, 52), sigma=c(2, 3, 2, 3, 6))
df$y <- 2 + df$x + df$epsilon
model <- lm(y ~ x, data=df)
plot(model)
plot(df$y ~ fitted(model))
plot(residuals(model) ~ fitted(model))
Related Question