Solved – Interpreting linearity in residual vs. fitted plot

data visualizationlinearityregressionresidualsscatterplot

I am working on a linear regression model and I am not sure how to interpret the following residual vs fitted values plot.

DAILY

For all I know residuals are supposed to fluctuate randomly around 0 which I would say is what my data does more or less (at least there is no apparent trend in one direction or the other).

However might it be a problem that so many of my data points are clustered around the centre? Or that I got too many outliers?

I know that I got quite a large number of data points (3,632) which adds to the clustering but shouldn't the points be more spread out?

I use daily returns for my regression but I could also use monthly data which would look like this.
MONTHLY

Best Answer

The plots you have reproduced are generally used to detect model misspecifications, usually curvature and heteroscedasticity.

What people do, and R does automatically, is superimpose a non-parametric lowess curve on this figure. This a local regression technique that offers an informal assessment of whether higher order terms need to be included in the model, among others. You would see that if the local regression line is curved. Then you might want to examine whether your model improves upon introducing them.

Heteroscedasticity means unequal variances and as such is a violation of the OLS assumptions. Although the estimator is still consistent and unbiased, it longer is Minimum Variance (BLUE) so that's something you need to guard against. The way to judge whether your errors are heteroscedastic is by observing the scatter in your figure. Unequal scatter across the horizontal zero line can be indicative of different variances and it might be worth investigating further. There are tests for heteroscedasticity, depending on how serious you are.

When it comes to outliers, these plots do not tell us much. Sure, there seem to be some points further away from others but what we need to remember is that the residuals do not have equal variances. Thus, a better way of detecting outliers is plotting standardized residuals against fitted values, where values above three or below minus three would suggest the presence of an outlier.

Conclusions from plots can be quite subjective though and they cannot take the place of formal tests. You can find a lot more on the internet, so take a look.

Related Question