I have a plot of residual values of a linear model in function of the fitted values where the heteroscedasticity is very clear. However I'm not sure how I should proceed now because as far as I understand this heteroscedasticity makes my linear model invalid. (Is that right?)
-
Use robust linear fitting using the
rlm()
function of theMASS
package because it's apparently robust to heteroscedasticity. -
As the standard errors of my coefficients are wrong because of the heteroscedasticity, I can just adjust the standard errors to be robust to the heteroscedasticity? Using the method posted on Stack Overflow here: Regression with Heteroskedasticity Corrected Standard Errors
Which would be the best method to use to deal with my problem? If I use solution 2 is my predicting capability of my model completely useless?
The Breusch-Pagan test confirmed that the variance is not constant.
My residuals in function of the fitted values looks like this:
(larger version)
Best Answer
It's a good question, but I think it's the wrong question. Your figure makes it clear that you have a more fundamental problem than heteroscedasticity, i.e. your model has a nonlinearity that you haven't accounted for. Many of the potential problems that a model can have (nonlinearity, interactions, outliers, heteroscedasticity, non-Normality) can masquerade as each other. I don't think there's a hard and fast rule, but in general I would suggest dealing with problems in the order
(e.g., don't worry about nonlinearity before checking whether there are weird observations that are skewing the fit; don't worry about normality before you worry about heteroscedasticity).
In this particular case, I would fit a quadratic model
y ~ poly(x,2)
(orpoly(x,2,raw=TRUE)
ory ~ x + I(x^2)
and see if it makes the problem go away.