Regression – Optimal Choice of m (Order of Derivative) for MGCV Splines in Generalized Additive Models

generalized-additive-modelmgcvregressionregularizationsplines

I am working on a project that aims to estimate the association between a certain marker in blood and risk of an adverse cardiovascular outcome. Conventionally, a clinical threshold for the marker has been established based on observations with other outcomes. The goal is to flexibly model the association between serum concentration of the marker and the outcome of interest, to determine if the association conforms to the same form seen with other outcomes.

I ran the analysis using both restricted cubic splines (RMS package in R) and penalized splines (MGCV). Using restricted cubic splines, there seem to be an inflection point that corresponds to the conventionally established concentration threshold. The same is not observed however when using a penalized spline (default s() option), which suggest the trend is linear. However, when I change the order of derivative in the thin plate spline penalty from the default m=2 to m=1, for example s(x, m=1), the penalized spline mimics the cubic spline, indicating the same inflection point.

I don’t have a good understanding of what the change in m does, and how/whether it should be fine-tuned to the specific data. The only reason I started playing with it was to explore why the default penalized spline did not detect an inflection point. Is it legitimate to change m, or are there ways to assess what optimal values of m to use? I will say that the diagnostics for the fitted GAM model seem to suggest a good fit regardless of the choice for m. AIC value was also essentially identical for the two models.

Can’t share the actual data since it is protected, but hopefully the question is generic enough. See below a figure with the splines under the different m values. A very "subtle" inflection point, but potentially of clinical significance.

splines by m value

Best Answer

I think for a more complete answer, I will wait for somebody more experienced with GAMs to correct what I say here, but this is what I know of the derivatives in GAMs and how the m argument works with it. For starters, recall that a first derivative is the slope of a smooth function at a specific point when the limit of a function approaches infinitesimally small increments (Larson & Edwards, 2023, p.103),

$$ f'(x) = \lim_{\Delta{x} \to 0}\frac{f(x + \Delta{x})-f(x)}{\Delta{x}} $$

where $x$ is an input and $\Delta{x}$ is the change in distance from that $x$ on the $x$ axis. Basically, at any given point on a smooth function, we can approximate its slope like a linear function. The second derivative conversely is defined as the change in slope for a smooth function, and is simply the derivative of the first derivative (Larson & Edwards, 2023, p.128):

$$ f''(x) = f'(f'(x)) $$

Note that how these derivatives are defined in the context of GAMs as well as their penalties can be a little different, given that a GAM function is typically defined by a number of basis functions rather than a singular function (Wood, 2017, p.162):

$$ f(x) = \sum\limits_{j=1}^{k}b_j(x)\beta_j $$

where $j$ is a given basis function. The actual penalization of this function by default involves finite differences with relation to the second derivative (see a relevant discussion here). In any case, to summarize the derivatives, we have two things: our slopes at a given point, and our rates of change. These can actually be visualized in a fit GAM model using the gratia package. Here is an example pulled directly from the Github:

library(mgcv)
library(gratia)
dat <- data_sim("eg1", n = 800, dist = "normal", scale = 2, seed = 42)
mod <- gam(y ~ s(x0) + s(x1) + s(x2) + s(x3), data = dat, method = "REML")
draw(mod)

If we draw our base model, we get these smooths:

enter image description here

To obtain the derivatives, we use the following function and use draw once more:

#### First Derivatives of Smooths ####
df <- derivatives(mod, type = "central")
draw(df)

enter image description here

Now if you compare the first and second crop of plots, you can see the first smooth's slope does not change much from about 0 to about .25. As the slope starts to decrease and hit the middle of the plot, its slope is decreasing from a large positive slope to an evermore small slope. This visual change is captured by the second plot, which shows the values of 0 to about .25 as flat until the slope starts decreasing, wherein you see the relationship also decrease in the plot. This decrease in slope continues until you see the flat line at the end, where the slope doesn't change in the original function.

Alternatively, the second derivatives are shown as such:

df2 <- derivatives(mod, type = "central", order = 2)
draw(df2)

enter image description here

You can see they look "blocky." This is because of the nature of second derivatives. If for example we simplify things a lot by defining a function as $f(x) = x^3$, we can find that

$$ f'(x) = 3x^2 $$

$$ f''(x) = 6x $$

Notice here that in the case of the defined polynomial here, our function effectively becomes linear at this given point, which contributes to the zig-zaggy behavior of the plot for df2.

What does this mean within the context of GAMs? We know that TPRS splines and some similar splines by default penalize the second derivatives using m=2. If we compare our model here with one that penalizes the first derivative, we can compare them directly with another gratia function called compare_smooths, which will show how it changes our respective fits. Here I explicitly set m here to 1 or 2 to make it clear what I'm doing.

#### Fit Models ####
fit1 <- gam(y ~ s(x0,m=1) + s(x1,m=1) + s(x2,m=1) + s(x3,m=1), 
            data = dat,
            method = "REML") # dx1
fit2 <- gam(y ~ s(x0,m=2) + s(x1,m=2) + s(x2,m=2) + s(x3,m=2), 
            data = dat,
            method = "REML") # dx2

#### Plot ####
c <- compare_smooths(fit1,fit2)
draw(c)

We can now directly compare the model fit1, which penalizes the first derivative, and fit2, which penalizes the second derivative:

enter image description here

Notice for some smooths it has simplified them, and in others it has made them more wiggly. Now for the million dollar question for which I don't have an exact answer: what does this mean for us? Well in some cases this may make the data fit the model better, and that is enough. Penalizing the right derivative can also lend itself to some other properties as well, as noted in Pedersen et al., 2019:

We have observed that mgcv still occasionally finds solutions with simulated data where the global term is over-smoothed. To avoid this issue, we recommend both careful choice of basis and relative degrees of freedom of global and group-level terms. In the examples in section “What are Hierarchical GAMs?,” we used smoothers with an unpenalized null space for the global smoother and ones with no null space for the group-level terms. When using TPRS this can be achieved with splines with a lower order of derivative penalized in the group-level smoothers than the global smoothers, as lower-order TPRS have fewer basis functions in the null space. For example, we used m=2 (penalizing squared second derivatives) for the global smoother, and m=1 (penalizing squared first derivatives) for group-level smoothers in models GS and GI.

The best I can say is that you should try to fit your data to the functional relationship that is best defined by the theory and data driving that function, and try when possible to not overfit the data, regardless of the selection of $m$. From my own experience with tinkering with $m$, it can sometimes create functions that way overfit the data where there are a sparse number of data points, and that is something you would of course want to avoid. Here is an extreme artificial example below, where the sp and k arguments are set to cause a lot of bends in the curve:

#### Get Filtered Beaver Data ####
library(MASS)
tib <- tibble(beav1) %>% 
  filter(time > 1800)
tib

#### Fit Models ####
bf1 <- gam(temp ~ s(time, m = 1, k = 15, sp = .001),
           data = tib,
           method = "REML")
bf2 <- gam(temp ~ s(time, m = 2, k = 15, sp = .001),
           data = tib,
           method = "REML")

#### Plot ####
bfp1 <- draw(bf1, residuals=T)
bfp2 <- draw(bf2, residuals=T)
ggpubr::ggarrange(bfp1,bfp2)

You can see because there are not a lot of data points, this can contribute to some very bad misspecification with a penalized first derivative. Predictions from this model would probably be poor if the actual functional relationship is more smooth:

enter image description here

It sounds like you already know how your data is supposed to behave, and that should give you the right answer anyway about what is an adequate fit to your data.

Citations

  • Larson, R., & Edwards, B. H. (2023). Calculus (12e ed.). Cengage.
  • Pedersen, E. J., Miller, D. L., Simpson, G. L., & Ross, N. (2019). Hierarchical generalized additive models in ecology: An introduction with mgcv. PeerJ, 7, e6876. https://doi.org/10.7717/peerj.6876
  • Wood, S. N. (2017). Generalized additive models: An introduction with R (2nd ed.). CRC Press, Taylor and Francis Group.