Splines – How to Fit Piecewise Linear Splines or Natural Cubic Splines in mgcv

mgcvsplines

How do GAMs generate forecasts that are outside the range of the training data?

I often need to use regression models for extrapolated forecasts, where the values of the predictors are outside their training range. Hence, models with piecewise linear splines or natural cubic splines (i.e., cubic for interpolation and linear for extrapolation) are crucial to my work.

I'm looking for a way to fit a gam model with the mgcv package that uses a piecewise linear smoother or natural cubic spline smoother. I tried looking at the s() and smooth.terms() help pages, but I couldn't find what I want.

Best Answer

The default thin plate regression spline basis will have the properties you require (effectively cubic for interpolation, linear for extrapolation) with the default second order derivative penalty.

The cubic regression spline basis (s(..., bs = "cr")) is exactly what you ask for; a cubic spline with linear extrapolation (second derivative is 0 at the boundary knots). This info in found in ?smooth.construct.cr.smooth.spec (yep, not easily remembered - there is an alias for this ?cubic.regression.spline which is linked to from ?smooth.terms).

The B spline (bs = "bs") also has cubic spline with linear extrapolation when the defaulkts are used for m: s(..., bs = "bs", m = c(3, 2)).

Piecewise linear smooths can be fitted by modifying the order of the penalty, but only for some of the smooth types in mgcv. For example the CRS basis (bs = "cr") can't be modified this way, and neither can the TPRS basis, but the B spline basis (bs = "bs") can be used instead.

Here's an example of two different piecewise linear fits, one with the B spline and one with a P spline.

library("gratia")
library("mgcv")

df <- data_sim("eg1", seed = 20)
m_bs <- gam(y ~ s(x2, bs = "bs", m = 1), data = df, method = "REML")
m_ps <- gam(y ~ s(x2, bs = "ps", m = c(0, 1)), data = df, method = "REML")

comp <- compare_smooths(m_bs, m_ps)
draw(comp)

the last line results in

enter image description here

indicating pretty similar fits; larger differences can be achieve by changing the order of the penalty on the P spline for example.

Note that what m means is basis-dependent. For the B spline, m = 1 controls the order of the spline (here linear with m = 1) and when passed as a length 1 vector like this it implies a zeroth-order penalty (i.e. we could have specified the fit with m = c(1,0) as this is what is chosen internally if we ask for a first order spline with m = 1. For the P spline, the first value passed to m is the order of the smooth (in the sense of the B spline) - 1 (so m = 2 would be a cubic P spline, and m = 1 is a piecewise linear smooth). The second value passed to m is the order of the difference penalty on the coefficients of the smooth.

You can control the extrapolation behaviour to some extent for the B spline by extending the effect of the penalty beyond the range of the data (into the period of extraploation). For more details see ?d.spline and a blog post I wrote on this topic.