Solved – R predict with “prediction” option

I'm trying to match the prediction interval of the predict.lm() function in R using the formula found in this discussion :

Obtaining a formula for prediction limits in a linear model

I'm using a student's quantile in my interval but in the end it's far larger from the one given by predict().

Is there any specific calculation in the predict function, I tried to look at the code but couldn't find any answer.
The formula looks ok as I found exactly the same from others source.

My R code :

airquality_clean <- na.omit(airquality)
attach(airquality_clean)

#Model estimation
model_1 <- lm(Ozone ~., data = airquality_clean)

#Unbias variance of the residuals
sigma_2 <- sum(model_1$residuals**2)/(dim(airquality_clean)[1]-dim(airquality_clean)[2])

#New observation
new <- data.frame(Solar.R=200,Wind=10,Temp=70,Day=1,Month=3)

#Calculated prediction interval
sigma <- sqrt(sigma_2*(1 + as.matrix(new)%*%solve(as.matrix(t(airquality_clean[,-1]))%*%as.matrix(airquality_clean[,-1]))%*%as.matrix(t(new))))
qt <- qt(0.995, df = dim(airquality_clean)[1]-dim(airquality_clean)[2])
int_pred_t <- cbind(predict(model_1, new)-(qt*sigma),predict(model_1, new)+(qt*sigma))
int_pred_t
          [,1]     [,2]
[1,] -22.59931 95.82563

#R prediction interval
predict(model_1, new, interval="predict", level=0.99)}
       fit       lwr      upr
1 36.61316 -21.12916 94.35548

I'm not too far but it's not the same results. If I use a p value from a normal distribution and not a student I'm even closer.

Thank you.

> #Model estimation > lm.1<- lm(Ozone ~., data = airquality_clean) > #Design Matrix > x=model.matrix(lm.1) > > #Defining linear function > a=c(1,200,10,70,1,3) > > #Defining new data.frame > a.dat=data.frame(Solar.R=a[2],Wind=a[3],Temp=a[4],Month=a[5],Day=a[6]) > > #Finding the upper prediction interval > predict(lm.1,newdat=a.dat)+qt(.995,summary(lm.1)$df[2])*summary(lm.1)$sigma*sqrt(1+t(a)%*%solve(t(x)%*%x)%*%a) [,1] [1,] 103.2434 > > #Finding the lower prediction interval > predict(lm.1,newdat=a.dat)-qt(.995,summary(lm.1)$df[2])*summary(lm.1)$sigma*sqrt(1+t(a)%*%solve(t(x)%*%x)%*%a) [,1] [1,] -16.76174 > > > #Using predict function > predict(model_1, a.dat, interval="predict", level=0.99) fit lwr upr 1 43.24083 -16.76174 103.2434 >

Best Answer

You didn't construct your new object correctly. You need to take into account the effect of the intercept term by adding a "1" (if you are fitting with an intercept) to your linear function if you are finding the C.I. manually yourself. Check out how I created vector a in the following code. But if you use the predict function to directly find the C.I., then the newdat argument does not need to have any "1" for the intercept. R will take care of that! Check out how I used a.dat below and and found identical results:

Best Answer

Related Solutions

Solved – Bayesian approach to modelling the prediction interval from a parametric model

Solved – How to get prediction from a neural network since its output is standardized

Related Question