In the usual way:
p <- predict(mod, newdata, type = "link", se.fit = TRUE)
Then note that p
contains a component $se.fit
with standard errors of the predictions for observations in newdata
. You can then form CI by multipliying the SE by a value appropriate to your desired level. E.g. an approximate 95% confidence interval is formed as:
upr <- p$fit + (2 * p$se.fit)
lwr <- p$fit - (2 * p$se.fit)
You substitute in an appropriate value from a $t$ or Gaussian distribution for the interval you need.
Note that I use type = "link"
as you don't say if you have a GAM or just an AM. In the GAM, you need to form the confidence interval on the scale of the linear predictor and then transform that to the scale of the response by applying the inverse of the link function:
upr <- mod$family$linkinv(upr)
lwr <- mod$family$linkinv(lwr)
Now note that these are very approximate intervals. In addition these intervals are point-wise on the predicted values and they don't take into account the fact that the smoothness selection was performed.
A simultaneous confidence interval can be computed via simulation from the posterior distribution of the parameters. I have an example of that on my blog.
If you want a confidence interval that is not conditional upon the smoothing parameters (i.e. one that takes into account that we do not know, but instead estimate, the values of the smoothness parameters), then add unconditional = TRUE
to the predict()
call.
Also, if you don't want to do this yourself, note that newer versions of mgcv have a plot.gam()
function that returns an object with all data used to create the plots of the smooths and their confidence intervals. You can just save the output from plot.gam()
in an obj
obj <- plot(model, ....)
and then inspect obj
, which is a list with one component per smooth. Add seWithMean = TRUE
to the plot()
call to get confidence intervals that are not conditional upon smoothness parameter.
Huh... I made my post as a guest on SO because I am still on a suspension, but then the question got migrated here!
So, if I understand you correctly, there's not really any similarity between the smooth s(x1, x2) and the random effect s(x1, x2, fac, bs = "re"), correct?
Correct. The function name "s" does not mean "smooth function" when s()
is used to construct a random effect. Broadly speaking, s()
is just a model term constructor routine that constructs a design matrix and a penalty matrix.
What I was envisioning was something smoothing in 2 dimensions like the former, but with some deviations from the average by factor level. You can get separate smooths per factor level using s(x1, x2, by=fac), but that completely separates the data for each factor level, rather than doing some partial pooling.
s(x1, x2, by = fac)
gives you something pretty close to what you want, except that as you said, data from different factor levels are treated independently. Technically, "close" means that s(x1, x2, by = fac)
gives you the correct design matrix but not the correct penalty matrix. In this regard, you are probably aiming at te(x1, x2, fac, d = c(2, 1), bs = c("tp", "re"))
. I have never seen such model term before, but its construction is definitely possible in mgcv
:
library(mgcv)
x1 <- runif(1000)
x2 <- runif(1000)
f <- gl(5, 200)
## "smooth.spec" object
smooth_spec <- te(x1, x2, f, d = c(2, 1), bs = c("tp", "re"))
## "smooth" object
sm <- smooth.construct(smooth_spec,
data = list(x1 = x1, x2 = x2, f = f),
knots = NULL)
You can check that this smooth term has 2 smoothing parameters as expected, one for the s(x1, x2, bs = 'tp')
margin, the other for the s(f, bs = 're')
margin.
Specification of k
turns out subtle. You need to explicitly pass nlevels(f)
to the random effect margin. For example, if you want a rank-10 thin-plate regression spline,
## my example factor `f` has 5 levels
smooth_spec <- te(x1, x2, f, d = c(2, 1), bs = c("tp", "re"), k = c(10, 5))
sapply(smooth_spec$margin, "[[", "bs.dim")
# [1] 10 5
At first I was thinking that perhaps we can simply pass NA
to the random effect margin, but it turns out not!
smooth_spec <- te(x1, x2, f, d = c(2, 1), bs = c("tp", "re"), k = c(10, NA))
sapply(smooth_spec$margin, "[[", "bs.dim")
# [1] 25 5 ## ?? why is it 25? something has gone wrong!
This might implies that there is a tiny bug... will have a check when available.
Best Answer
You don't give an example so I'm not sure what you mean by an "interval" (how big?, etc), but you run the risk of having a biased estimate of the derivative of the smooth if your interval is long relative to the wiggliness of the estimated smooth.
It is quite easy to compute a finite difference-based estimate of the derivative of an estimated smooth using existing tools in mgcv. I'll summarise the approach below but this is cribbed from some blog posts I wrote that show how to do this and consider simultaneous intervals, not just the across-the-function intervals I'll show below.
I'll also use some functions I wrote to do this analysis that are in my gratia package.
The basic idea is to use the
type = "lpmatrix"
option of thepredict()
method forgam
objects. This returns the linear predictor matrix for a set of locations at which we evaluate the spline, which, when multiplied by the coefficients of the model yields predicted values for the smooth. We compute two of these matrices; one for a fine grid of points over the range of the covariate for the smooth, and the second at the same locations but shifted a tiny amount. We difference these two matrices to get a linear predictor matrix for the derivative at the grid of points. The variance-covariance matrix of the model coefficients is then used to created a confidence interval. As the model is additive we can ignore the other covariates and smooths and do this for each smooth in turn.All of this is done by the
fderiv()
function in gratia:Now
fd
andfd2
contain the finite difference-estimates of the derivative of all or one smooth from the fitted model. The blog posts linked below go into a lot more detail of what is going on under the hood.We can generate a confidence interval for the derivative using the
confint
methodThis has info for all the smooths, and by default is a 95% interval. First, add on a column of values where the derivatives were evaluated:
Then we can plot:
Giving:
Blog posts that contain more detail: