Solved – residuals increases with fitted values : what kind of model misspecification might it indicate

econometricsmisspecificationmodelingpredictive-modelsresiduals

As we know, plotting residuals against fitted values may indicate whether the model is misspecified or the variance is not constant, or both. Let's focus on model misspecification only and set aside heteroskedasticity for the moment.

Residuals vs fitted values

For example, example (b) shows that residuals increase with fitted values. What kind of model misspecification does example (b) imply? Might it indicate that some important predictor is not included at the right-hand side of the equation? Or might it indicate that some of the predictors need to be transformed to a higher order, for example, from its original scale to square?

Also what might example (c) say about model misspecification?

These examples seem to be pretty common in textbooks or online tutorial but it appears to me that few have tried to figure out what the underlying model misspecification contributes to these different types of residuals-fitted values plots.

Best Answer

Both (b) and (c) most likely just mean your model is missing a predictor, or else some of its coefficients deviate from their optimal values under the data. E.g. the scenario in (b) could happen if the data truly lay around a line, but the line you fitted had a slope that was too small. This could be the result of an iterative optimization algorithm that was terminated too soon (before it found the optimum).

The scenario in (c) is more likely to reflect a missing predictor. Let's say you again fitted a line, but the true function by which the data were generated was quadratic, with a negative coefficient in the quadratic term. You subtracted out the linear part, so you're left with the (concave) parabola of the quadratic part that you missed out.

These kinds of biases only happen if the model term(s) you left out is/are somehow linked with those that you did include (e.g. $x$ is correlated with $x^2$). If you left out some relevant effect that is completely independent of the things you did model, that won't show up in the residuals as a bias (it will just be part of the apparently random noise).

Related Question