Solved – Getting the prediction standard error from a natural spline fit

multiple regressionrsplines

I'm fitting a natural spline fit to some data points. I'd like to estimate the prediction error for the predicted value. In linear regression (I agree that natural spline is also a linear regression with a specific type of design matrix), we know:

$\hat{\beta} = (X^T X)^{-1}X^TY \rightarrow \text{ assuming var(Y) = } \sigma^2 I \text{ then : }var(\hat{\beta}) = (X^TX)^{-1} \sigma^2 $

Now consider $\hat{Y} = {X_i}^T \hat{\beta} + \epsilon_i$. We can then write:

$Var(\hat{Y}) = {X_i}^T ((X^TX)^{-1} \sigma^2) (X_i) + \sigma^2$

This is easy to calculate for linear regression. How should I do it with natural spline? I can get the design matrix for natural spline. I can get $(X^TX)^{-1} \sigma^2$ but how can I get the rest of it:

Here is an example in R:

set.seed(12345)
x <- c(1:100)
y <- sin(pi*x/50)
epsilon <- rnorm(100, 0, 3)
knots <- c(10, 20, 30, 40, 50, 60, 70, 80, 90)
myFit <- lm(y ~ ns(x, knots = knots))

Now consider x = 32.5 . How can I get the variance for the $\hat{Y}$ corresponding to x = 32.5 ? I know we can use the predict function. however, what I do really want is to get calculate it similar to linear regression by getting the design matrix and multiplying them together.

I really appreciate your help.

Best Answer

You can get the design matrix for a linear model in R using model.matrix():

X <- model.matrix(myFit)
sigma <- summary(myFit)$sigma
var.Yhat <- (diag(X %*% solve(t(X) %*% X) %*% t(X)) + 1) * sigma^2

Or, if you want to get the prediction variance for new values of $X$, use ns() to transform into the natural spline basis first:

X.new <- cbind(1, ns(x.new, knots=knots))
var.Yhat <- (diag(X.new %*% solve(t(X) %*% X) %*% t(X.new)) + 1) * sigma^2
Related Question