Solved – Difference between Norm of Residuals and what is a “good” Norm of Residual

regressionresiduals

I am doing some basic fitting of data and exploring different fits. I understand that the residual is the difference between the sample and the estimated function value. The norm of the residuals is a measure of the deviation between the correlation and the data. A lower norm signifies a better fit.

Suppose a cubic fit has a norm of residuals of 0.85655 and a linear fit has a norm of residuals of 0.89182. What if the norms are 0.17113 and 0.24916? Is the difference between these two significant? Is a norm of residual less than 1 considered good? If not what is generally regarded as "acceptable" norm of residual.

Best Answer

So, I would recommend using standard method for comparing nested models. In your case, you consider two alternative models, the cubic fit being the more "complex" one. An F- or $\chi^2$-test tells you whether the residual sum of squares or deviance significantly decrease when you add further terms. It is very like comparing a model including only the intercept (in this case, you have residual variance only) vs. another one which include one meaningful predictor: does this added predictor account for a sufficient part of the variance in the response? In your case, it amounts to say: Modeling a cubic relationship between X and Y decreases the unexplained variance (equivalently, the $R^2$ will increase), and thus provide a better fit to the data compared to a linear fit.

It is often used as a test of linearity between the response variable and the predictor, and this is the reason why Frank Harrell advocates the use of restricted cubic spline instead of assuming a strict linear relationship between Y and the continuous Xs (e.g. age).

The following example comes from a book I was reading some months ago (High-dimensional data analysis in cancer research, Chap. 3, p. 45), but it may well serves as an illustration. The idea is just to fit different kind of models to a simulated data set, which clearly highlights a non-linear relationship between the response variable and the predictor. The true generative model is shown in black. The other colors are for different models (restricted cubic spline, B-spline close to yours, and CV smoothed spline).

library(rms)
library(splines)
set.seed(101)
f <- function(x) sin(sqrt(2*pi*x))
n <- 1000
x <- runif(n, 0, 2*pi)
sigma <- rnorm(n, 0, 0.25)
y <- f(x) + sigma
plot(x, y, cex=.4)
curve(f, 0, 6, lty=2, add=TRUE)
# linear fit
lm00 <- lm(y~x)
# restricted cubic spline, 3 knots (2 Df)
lm0 <- lm(y~rcs(x,3))
lines(seq(0,6,length=1000), 
      predict(lm0,data.frame(x=seq(0,6,length=1000))),
      col="red")
# use B-spline and a single knot at x=1.13 (4 Df)
lm1 <- lm(y~bs(x, knots=1.13))
lines(seq(0,6,length=1000),
      predict(lm1,data.frame(x=seq(0,6,length=1000))), 
      col="green")
# cross-validated smoothed spline (approx. 20 Df)
xy.spl <- smooth.spline(x, y, cv=TRUE)
lines(xy.spl, col="blue")
legend("bottomleft", c("f(x)","RCS {rms}","BS {splines}","SS {stats}"), 
       col=1:4, lty=c(2,rep(1,3)),bty="n", cex=.6)

exmaple

Now, suppose you want to compare the linear fit (lm00) and model relying on B-spline (lm1), you just have to do an F-test to see that the latter provides a better fit:

> anova(lm00, lm1)
Analysis of Variance Table

Model 1: y ~ x
Model 2: y ~ bs(x, knots = 1.13)
  Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
1    998 309.248                                  
2    995  63.926  3    245.32 1272.8 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Likewise, it is quite usual to compare GLM with GAM based on the results of a $\chi^2$-test.