I'm trying to match the prediction interval of the predict.lm() function in R using the formula found in this discussion :
Obtaining a formula for prediction limits in a linear model
I'm using a student's quantile in my interval but in the end it's far larger from the one given by predict().
Is there any specific calculation in the predict function, I tried to look at the code but couldn't find any answer.
The formula looks ok as I found exactly the same from others source.
My R code :
airquality_clean <- na.omit(airquality)
attach(airquality_clean)
#Model estimation
model_1 <- lm(Ozone ~., data = airquality_clean)
#Unbias variance of the residuals
sigma_2 <- sum(model_1$residuals**2)/(dim(airquality_clean)[1]-dim(airquality_clean)[2])
#New observation
new <- data.frame(Solar.R=200,Wind=10,Temp=70,Day=1,Month=3)
#Calculated prediction interval
sigma <- sqrt(sigma_2*(1 + as.matrix(new)%*%solve(as.matrix(t(airquality_clean[,-1]))%*%as.matrix(airquality_clean[,-1]))%*%as.matrix(t(new))))
qt <- qt(0.995, df = dim(airquality_clean)[1]-dim(airquality_clean)[2])
int_pred_t <- cbind(predict(model_1, new)-(qt*sigma),predict(model_1, new)+(qt*sigma))
int_pred_t
[,1] [,2]
[1,] -22.59931 95.82563
#R prediction interval
predict(model_1, new, interval="predict", level=0.99)}
fit lwr upr
1 36.61316 -21.12916 94.35548
I'm not too far but it's not the same results. If I use a p value from a normal distribution and not a student I'm even closer.
Thank you.
Best Answer
You didn't construct your
new
object correctly. You need to take into account the effect of the intercept term by adding a "1" (if you are fitting with an intercept) to your linear function if you are finding the C.I. manually yourself. Check out how I created vectora
in the following code. But if you use the predict function to directly find the C.I., then thenewdat
argument does not need to have any "1" for the intercept. R will take care of that! Check out how I useda.dat
below and and found identical results: