Solved – Confidence Interval for non-smooth term in gam (mgcv)

confidence intervalgeneralized-additive-modellinearmgcvr

I fitted a gam model in mgcv package and now want to get the confidence intervals for the non-smooth term.

require(datasets)
require(mgcv)
b = gam(Temp ~ s(Ozone) + Solar.R, data=airquality, family=gaussian)

So, when I wanted to print the confidence intervals, I ran:

summary(b)

and it prints only

Parametric coefficients:
          Estimate Std. Error t value Pr(>|t|)    
(Intercept) 77.8580645  1.4091014  55.254   <2e-16 ***
Solar.R     -0.0003532  0.0069707  -0.051     0.96 

So, I am interested in the confidence intervals for the non-smooth term Solar.R. How can I do this?

Thank you.

Best Answer

You need the standard error of the term, which is given in the output from summary(), but you can grab the standard errors from the diagonals of the variance-covariance matrix, which is extracted using vcov().

Fit your example model:

require(datasets)
require(mgcv)
b <- gam(Temp ~ s(Ozone) + Solar.R, data=airquality, family=gaussian)

Grab the coefficients:

beta <- coef(b)

...and the standard errors, which are the square roots of the diagonals of the variance-covariance matrix $\mathrm{V}$, noting that here I use the variance-covariance matrix that is adjusted to account for the selection of the smoothness parameter for the smooth in the model, $\mathrm{V_b}$

Vb <- vcov(b, unconditional = TRUE)
se <- sqrt(diag(Vb))

Next, identify the $\hat{\beta}$ etc for the term you want

i <- which(names(beta) == "Solar.R")

Then an approximate 95% confidence interval is

beta[i] + (c(-1,1) * (2 * se[i]))

This gives:

> beta[i] + (c(-1,1) * (2 * se[i]))
[1] -0.01429463  0.01358823

The 2 above comes from the 0.975th probability quantile of the Gaussian distribution.

> qnorm(0.975)
[1] 1.959964

Often we take into account the additional uncertainty of having estimated the variance of the residuals by using the t distribution, for which we need the residual degrees of freedom

rdf <- df.residual(b)
qt(0.975, df = rdf)

which gives

> qt(0.975, df = rdf)
[1] 1.982583

Which is close to 2 also. This distinction only really makes any difference when you have a small data set, so you'll generally see 95% intervals given as plus/minus 2 * standard error in mgcv.