Solved – Extracting significance of gam smoothing terms

generalized-additive-modelmgcvsplines

I have a time course data to which I'd like to fit a gam and have easy interpretability, and by that I mean obtaining coefficient estimates for each of the splines.

These are my data:

df <- data.frame(y = c(0.15,0.17,0.07,0.17,0.01,0.15,0.18,0.04,-0.06,-0.08,0,0.03,-0.27,-0.93,0.04,0.12,0.08,0.15,0.04,0.15,0.03,0.09,0.11,0.13,-0.11,-0.32,-0.7,-0.78,0.07,0.04,0.06,0.12,-0.15,0.05,-0.08,0.14,-0.02,-0.14,-0.24,-0.32,-0.78,-0.81,-0.04,-0.25,-0.09,0.02,-0.13,-0.2,-0.04,0,0.02,-0.05,-0.19,-0.37,-0.57,-0.81),
                 log2.time = rep(c(-1, 0, 1, 1.58,2, 2.58, 3, 3.32, 3.58, 4.17, 4.58, 5.58, 6.17, 7.39),4))

I thought I'd use the mgcv R package, using this model:

fit <- mgcv::gam(y ~ s(log2.time), data = df, method = "REML")

Using summary(fit), I get:

> summary(fit)

Family: gaussian 
Link function: identity 

Formula:
y ~ s(log2.time)

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.10893    0.01681  -6.479 3.65e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
               edf Ref.df     F p-value    
s(log2.time) 4.101  5.036 46.36  <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.809   Deviance explained = 82.3%
-REML = -27.673  Scale est. = 0.015829  n = 56

So, overall the log2.time smoothing term is significantly different from zero. But as stated above I would like to obtain a coefficient for each of the splines, along with their standard errors and a measure of whether they are significantly different from zero.

Should I be using cubic splines for this purpose, with this model:

fit <- mgcv::gam(y ~ s(log2.time,bs='cr'), data = df, method = "REML")

And can the spline coefficient estimates be obtained as follows:

spline.coefs <- coef(fit)[-1]
spline.knots <- fit$smooth[[1]]$xp

for(s in 2:length(spline.knots)){
  spline.coefs[s-1] <- spline.coefs[s-1]/(spline.knots[s]-spline.knots[s-1])
}

And then would the coefficient standard errors be:

spline.coefs.se <- sqrt(diag(vcov(fit, unconditional  = TRUE)))[-1]

And finally, can p-values for the spline coefficient estimates be obtained?

Best Answer

The diagonal of the variance matrix of the model parameters contains the variances of all the coefficients. If you take the square root of the diagonal elements you'll get the standard errors of the intercept plus the spline coefficients:

> fit <- mgcv::gam(y ~ s(log2.time,k=3), data = df, method = "REML")
> vcov(fit)
                 (Intercept) s(log2.time).1 s(log2.time).2
(Intercept)     3.122736e-04  -1.059763e-18  -3.707237e-20
s(log2.time).1 -1.059763e-18   3.968441e-03   3.479426e-08
s(log2.time).2 -3.707237e-20   3.479426e-08   3.122736e-04
> sqrt(diag(vcov(fit)))
   (Intercept) s(log2.time).1 s(log2.time).2 
    0.01767127     0.06299556     0.01767127

If you use vcov(fit, unconditional = TRUE) you'll get a (Bayesian) covariance matrix adjusted for having estimated the smoothness parameter, $\lambda$. Without this, the uncertainty on the spline coefficients is too small as it assumes $\lambda$ is known.

The values shown by coef(fit) are the parameters for the two thin plate spline basis functions used in the model. They are not the slopes of the fitted spline itself however; the slope or first derivative of the spline is varying throughout the range of $x$. You can estimate the slope of the spline at any point using finite differences if you want that, and compute a confidence interval on that slope.

There is only one spline here, built up from two basis functions (well three depending on how you look at it, but one of those is the intercept). Your use of "spline" and "slope" are a little confusing — if you care to clarify I will update this answer should elements of it be irrelevant or lacking.

Note that your model was actually fitted using k = 3 and then the constant basis function was dropped, hence a maximum of 2 degrees of freedom for the smoother. However even this has been penalized a little; if you don't want penalization (smoothness selection, or estimation of $\lambda$), use fx = TRUE.

Update:

If you want piecewise-linear fits then you can achieve this with the b-spline basis bs = 'bs':

fit <- mgcv::gam(y ~ s(log2.time, bs='bs', k = 3, m = 1), data = df, 
                 method = "REML")

where m = 1 now refers to the order of the spline, not the penalty. In this model the hinge point of the fit is in the middle of the data. Removing the k = 3 results in the default fit with k = 10.

The coefficients are still not the slopes of the linear sections however. You should be able to extract the derivative of the spline from the b-spline basis but I haven't worked out how to coax mgcv into yielding that info just yet. However, you can get this with finite differences.

library('mgcv')
devtools::install_github('gavinsimpson/gratia')
library('gratia') # for finite differences-based derivatives
library('ggplot2')
library('cowplot')
theme_set(theme_bw())

## fit piecewise-linear model
fit <- gam(y ~ s(log2.time, bs='bs', m = 1), data = df, 
           method = "REML")
## new locations to evaluate everything at
newdf <- with(df, data.frame(log2.time = seq(min(log2.time), 
                                             max(log2.time),
                                             length = 100)))
## extract model coefs
b <- coef(fit)
## get the Xp matrix, the prediction matrix for the spline
Xp <- predict(fit, newdf, type = "lpmatrix")
## evaluate 1st derivative at new locations
d <- fderiv(fit, newdata = newdf)

## stick everything together
pltdf <- data.frame(log2.time  = newdf$log2.time,
                    spline     = Xp[,-1] %*% b[-1],
                    derivative = d[[1]][['s(log2.time)']][['deriv']])

## individual plots
p1 <- ggplot(pltdf, aes(x = log2.time, y = spline)) +
    geom_line()
p2 <- ggplot(pltdf, aes(x = log2.time, y = derivative)) +
    geom_step()
## combine
plot_grid(p1, p2, align = 'hv', axis = 'lrtb', ncol = 1)

enter image description here

Related Question