Solved – Norm of residuals to measure goodness of fit

goodness of fit

I used a least squares method on my data set, using the lsqr Matlab function. I know that the norm of residuals is a measure of the goodness of fit, but how can I assess whether the value of the norm of residuals is "good"? Which is the range of this measure?

Best Answer

Residuals are "acceptable" when they have, at least approximately, the following characteristics:

  • They are not associated with the fitted values (there's no evident trend or relationship between them).

  • They are centered around zero.

  • Their distribution is symmetric.

  • They contain no, or extremely few, unusually large or small values ("outliers").

  • They are not correlated with other variables you have in the data set.

These are not criteria; they're guidelines. For instance, correlation with other variables is sometimes ok. The correlation merely suggests that the residuals could be further improved by including those variables in the model. But the first three points are as close to criteria as we can get in general, in the sense that a strong violation of any of the them is a clear indication the model is wrong.

There are plenty of examples of evaluating residuals and goodness of fit in postings on this site; one recent one where the residuals look acceptable appears at Looking for estimates for my data using cumulative beta distribution. An example of clearly unacceptable residuals appears (inter alia) in the question at Testing homoscedasticity with Breusch-Pagan test. There, the distribution of residuals is asymmetric (there is a long tail in the negative range), the residuals vary in important ways with another variable (the "index"), and they exhibit a v-shaped association with the fitted values.

A set of acceptable residuals is "good" when their typical size is small enough to alleviate any worries that your conclusions might be incorrect. "Small enough" depends on how the model will be used, but the main point is that you want to pay attention to how large the residuals can be, because that measures the typical deviation between the dependent variable and the fit. When your data are representative of a process or population, that typical deviation estimates how closely the model will predict the unsampled members of the population.

For example, a model of survivorship from a medical procedure might express its residuals as percentage of survival time. If a typical size is 100%, the model may be almost worthless ("you might die any time between tomorrow and 20 years from now) but if it's 10%, the model is probably excellent for anybody ("people with your condition usually live between 9 and 11 years"). A 1 km residual in a spatial location model would be great for sending a satellite to Mars but could shipwreck boats in a harbor. Context matters when evaluating the goodness of fit.

Several measures of residual size are in use, again depending on the purpose of the analysis. The commonest is an (adjusted) root mean square, which is almost always reported by least squares software. It would be naive and foolhardy to rely on this number without checking the guidelines for acceptability. Once you have confirmed the residuals are acceptable, though, this number is an effective way to evaluate and report the goodness of fit.

Among the alternative measures of residual size, an excellent one is the "H-spread" of the residuals. (Split the set of residuals into an upper half and lower half. The H-spread is the difference between the median of the upper half and the median of the lower half. It is practically the same as the interquartile range of the residuals). This measure is not as sensitive to the most extreme residuals as the root mean square. Nevertheless, as indicated in the guideline about outliers, it's also a good idea to look at the sizes of the most positive and most negative residuals.

Because it is mentioned elsewhere in this thread, let's look at $R^2$. This figure is related to the size of residuals, but the relation is indirect. As you can see from the formula, it depends on the total variation in the dependent variable, which in turn depends on how much the independent variables vary in the dataset. This makes it much less useful than directly examining the root mean square residual. $R^2$ values can also be erroneous: they can become extremely high due to the presence of even a single "high leverage" value in the data, giving a false impression of a good fit. Any decent measure of the typical residual size will not have this problem.

Related Question