Solved – Smoother matrix from smooth.spline

rsmoothingsplines

I'm trying to obtain the smoother matrix out of my smooth.spline fit.
I use the bone mineral density data from http://statweb.stanford.edu/~tibs/ElemStatLearn/.

bone <-read.table("bone.data", header=TRUE)
bmd_age <- smooth.spline(bone$age, bone$spnbmd, all.knots=TRUE, cv=TRUE)
bmd_fit <- predict(bmd_age, sort(bone$age))
df <- bmd_age$df

To obtain a column of the smoother matrix, I can replace the response vector (bone$spnbmd) by a vector with a single 1 and the rest filled with 0's. It is both what the professor recommended and what I found online https://stat.ethz.ch/pipermail/r-help/2006-June/108471.html.

So I use

smooth.matrix = function(x){
  n = length(x);
  sm = matrix(0, n, n);  
  for(i in 1:n){
    y = rep(0, n); y[i]=1;
    sm_i = predict(smooth.spline(x, y, df=df),x)$y;
    sm[,i]= sm_i;
  }
  return(sm)
}

sm <- smooth.matrix(bone$age)

If the smoother matrix is correct, the following two quantities should be the same (both fitted values from the smoothing spline model).

fromsm <- sm%*%(bone$spnbmd[order(bone$age)])
fromfit <- bmd_fit$y 

However, they are not. I think the problem is in the definition of smooth.matrix function, where

sm_i = predict(smooth.spline(x, y, df=df),x)$y;

is not using the same smoothing fit as in bmd_age. I've tried fixing the degree of freedom, spar, lambda, cv=FALSE, etc. but no luck so far. How to fix it?

Best Answer

After many hours of exploration, here is what I found:

Because smooth.spline algorithm chooses spar instead of lambda, it is only possible to (sort of) fix spar. However, lambda is a function of spar and another variable matrix. So fixing spar does not fix lambda necessarily. I have not found an easy way to extract the smoother matrix out of smooth.spline. However, for the purpose of computing variance, the algorithm provided in https://stat.ethz.ch/pipermail/r-help/2006-June/108471.html (fix spar instead of df) is a close estimate of the true smoother matrix. The variance computed from $SyS^T$, where $S$ is the estimated smoother matrix, is pretty close to the one computed from the correct smoother matrix.

Another R package "assist" has a function "ssr()" that also does smooth spline regression. It is not as powerful as smooth.spline. But the built-in function "hat.ssr()" gives the true smoother matrix of the model obtained from "ssr()".

Related Question