Generalized Additive Model – Understanding GAM Parameter Estimates and Shrinkage

generalized-additive-modelmgcv

I run the following GAM model in R using the mgcv pacakge:

my_fit <- mgcv::gam(mpg ~ s(disp, bs = "ts", k = 15), gaussian(), mtcars, method = "REML")

As far as I understand the help page of mgcv::smooth.construct.ts.smooth.spec, setting bs = "ts" shrinks the coefficients of the fit towards zero. Looking at summary(my_fit) returns

Approximate significance of smooth terms:
        edf    Ref.df F     p-value    
s(disp) 4.251  14     14.23 <2e-16 *** 

so the effective degree of freedoms are 4.251. Do I understand this correctly, out of the k = 15 basis functions, about 9 are shrunk to zero ($15 – 1 – 9 = 5 \approx 4.251$)? However, my_fit$coefficients returns 15 coefficients for s(disp), and most of them are quite far from zero. I would have expected either only about 5 coefficients or 15 coefficients and about 10 are close to zero. Where do I see the shrinkage? Or is there no shrinkage going on?

Best Answer

There are 14 basis functions here, not 15; one is removed when the identifiability (sum-to-zero) constraint is applied to the basis.

All of the weights (coefficients) for the basis functions are shrunk to some extent if the smooth is penalized. As there are 14 basis functions there will always be 14 coefficients associated with this smooth regardless of the amount of shrinkage. If the smoothing parameter, $\lambda$, is sufficiently high, those coefficients will be shrunk to be effectively zero.

However, there is no reason to presume that 9 or 10 of the coefficients (in your case) will be all shrunk to effective zero. The penalty is controlling the wiggliness of the estimated smooth and that can counterintuitively require non-zero values of all the coefficients to achieve a fit that uses 4 to 5 effective degrees of freedom (EDF).

The point that the help page is making is that the estimated function (the page uses the word "term") can be shrunk towards a constant function at large $\lambda$ values rather than towards a linear function.

So, there is certainly shrinkage going on; the smooth in your model uses ~4 EDF when it could have used 14 EDF:

library("mgcv")
library("gratia")

m_pen <- gam(mpg ~ s(disp, bs = "ts", k = 15),
             data = mtcars,
             family = gaussian(),
             method = "REML")

m_unpen <- gam(mpg ~ s(disp, bs = "ts", k = 15, fx = TRUE),
               data = mtcars,
               family = gaussian(),
               method = "REML")

cs <- compare_smooths(m_pen, m_unpen)
draw(cs)

enter image description here

Clearly the estimated smooth has been shrunk away from the unpenalised fit.

However, because of the parameterisation being used for the smooth and the wigglines penalty it's not easy to see where the shrinkage has taken place in terms of the model coefficients. In an alternate parametrisation, the so-called natural parameterization, the EDF of the individual basis functions that comprise a smooth are on a scale that demonstrates the shrinkage, but that requires you to change (reparameterise) the basis functions.

If we draw the basis functions weighted by the model coefficients, for both models, you'll get some idea of where the shrinkage has happened:

bs_pen <- m_pen |>
  basis() |>
  draw()
bs_unpen <- m_unpen |>
  basis() |>
  draw()

library("patchwork")

bs_pen + bs_unpen + plot_layout(ncol = 2)

enter image description here

It's clear here that many of the basis functions have been shrunk towards zero functions.

Many of the coefficients are close to zero here

> coef(m_pen)                                                                 
 (Intercept)    s(disp).1    s(disp).2    s(disp).3    s(disp).4    s(disp).5 
 20.09062500  -7.78783043  -0.12857499  -1.83287539  -0.17948960  -0.05298061 
   s(disp).6    s(disp).7    s(disp).8    s(disp).9   s(disp).10   s(disp).11 
  0.66157030   0.40009721   0.42537796   0.17876507   0.45666967   0.32813318 
  s(disp).12   s(disp).13   s(disp).14 
 -0.43256068  -2.75386484 -14.07700587

indicating low weights for certain functions.

Related Question