Solved – Confidence Interval for non-smooth term in gam (mgcv)

confidence intervalgeneralized-additive-modellinearmgcvr

I fitted a gam model in mgcv package and now want to get the confidence intervals for the non-smooth term.

require(datasets)
require(mgcv)
b = gam(Temp ~ s(Ozone) + Solar.R, data=airquality, family=gaussian)

So, when I wanted to print the confidence intervals, I ran:

summary(b)

and it prints only

Parametric coefficients:
          Estimate Std. Error t value Pr(>|t|)    
(Intercept) 77.8580645  1.4091014  55.254   <2e-16 ***
Solar.R     -0.0003532  0.0069707  -0.051     0.96

So, I am interested in the confidence intervals for the non-smooth term Solar.R. How can I do this?

Thank you.

Best Answer

You need the standard error of the term, which is given in the output from summary(), but you can grab the standard errors from the diagonals of the variance-covariance matrix, which is extracted using vcov().

Fit your example model:

require(datasets)
require(mgcv)
b <- gam(Temp ~ s(Ozone) + Solar.R, data=airquality, family=gaussian)

Grab the coefficients:

beta <- coef(b)

...and the standard errors, which are the square roots of the diagonals of the variance-covariance matrix $\mathrm{V}$, noting that here I use the variance-covariance matrix that is adjusted to account for the selection of the smoothness parameter for the smooth in the model, $\mathrm{V_b}$

Vb <- vcov(b, unconditional = TRUE)
se <- sqrt(diag(Vb))

Next, identify the $\hat{\beta}$ etc for the term you want

i <- which(names(beta) == "Solar.R")

Then an approximate 95% confidence interval is

beta[i] + (c(-1,1) * (2 * se[i]))

This gives:

> beta[i] + (c(-1,1) * (2 * se[i]))
[1] -0.01429463  0.01358823

The 2 above comes from the 0.975th probability quantile of the Gaussian distribution.

> qnorm(0.975)
[1] 1.959964

Often we take into account the additional uncertainty of having estimated the variance of the residuals by using the t distribution, for which we need the residual degrees of freedom

rdf <- df.residual(b)
qt(0.975, df = rdf)

which gives

> qt(0.975, df = rdf)
[1] 1.982583

Which is close to 2 also. This distinction only really makes any difference when you have a small data set, so you'll generally see 95% intervals given as plus/minus 2 * standard error in mgcv.

Related Solutions

Solved – Centering constraints in MGCV GAM

What that means is that each smooth created using a by term will be centred at zero. In effect this means that the data for each level of by has the same mean (i.e. the intercept or constant term in the model). What Simon Wood means is that if the data have different mean values (Site A has higher values than Site B say and by = Site) then because of the centring constraints this difference in the means of the different groups will not be taken into account in the model.

Using an example from ?gam.model I will demonstrate the difference:

## Factor `by' variable example (with a spurious covariate x0)
## simulate data...

dat <- gamSim(4)

## fit model...
b <- gam(y ~ fac + s(x2, by=fac) + s(x0), data=dat)
bb <- gam(y ~ s(x2, by=fac) + s(x0), data=dat) ## without fac
summary(b)
summary(bb)

Note how in model b we include the fac variable as a separate parametric term in the model formula as well as in the s( ) part. The intercept in this model will then represent the mean for the first group (level) of fac and the coefficients for fac in the model (and in the summary output) will represent the difference in means from the first group of the other groups.

Look closely at the output from summary(), notice how the deviance explained by b is much higher than that of bb (without the parametric fac term), and consequently b has a lower GCV score (lower is better) and a lower scale parameter. The higher scale parameter in bb is due to there being a large amount of unexplained variance (the between group effect of fac) left over in the simpler model.

R> summary(b)
....
R-sq.(adj) =  0.644   Deviance explained = 65.6%
GCV score = 3.9013  Scale est. = 3.7517    n = 400

R> summary(bb)
....
R-sq.(adj) =  0.451   Deviance explained = 47.3%
GCV score = 6.0322  Scale est. = 5.7758    n = 400

Solved – Selecting GAM model link function and autocorrelation (mgcv)

Q1: In the $log(y)$ model with Gaussian errors you are modelling the mean of log-transformed $y$, not the mean of $y$, and when back-transformed to the original scale the two won't coincide. That's often why GLM-like models are favoured as you can model on the scale you actually want to model on, not some transformed scale. With the $log(y)$ model there's also nothing forcing model outputs or summaries to be strictly positive, whereas with the Gamma there would be.

Q2: right; in the Gaussian model we can think of the data as being correlated correlated Gaussian random variables and represent the correlations via a correlation matrix. There isn't an easy way to do the same thing in the GLM context.

There is nothing wrong with fitting AR models to log-transformed data or to the residuals of a regression model fitted to such data.

A single AIC is not very informative; you need to compare two or more AIC values.

If A and B are not time variables then you can easily undersmooth (overfit) data that is autocorrelated; when you say the smooths are "significantly different" for M3, I might hazard a guess that they are much smoother/less wiggly than either of the other two models?

If you suspect residual autocorrelation, or you aren't explicitly modelling time in your linear predictor, I'd suggest that fitting with the AR(1) is appropriate.

You can use the gamm() function to fit the M1 type model but estimating $\rho$ rather than fixing it a priori. You'll need to change from the spline-like ranefs to using the random argument to specify the ranefs in the model and use correlation = corAR1(form = ~ time) or correlation = corAR1(form = ~ time | id) to get an AR(1) or an AR(1) nested within each subject (id) respectively.

Best Answer

Related Solutions

Solved – Centering constraints in MGCV GAM

Solved – Selecting GAM model link function and autocorrelation (mgcv)

Related Question