Solved – How different are restricted cubic splines and penalized splines

regressionsplines

I am reading a lot about using splines in various regression problems. Some books (e.g. Hodges Richly Parrameterized Linear Models) recommend penalized splines. Others (e.g. Harrell Regression Modeling Strategies) opt for restricted cubic splines.

How different are these, in practice? Would you often get substantively different results from using one or the other? Does one or the other have particular advantages?

Best Answer

From my reading, the two concepts you ask us to compare are quite different beasts and would require an apples and oranges-like comparison. This makes many of your questions somewhat moot — ideally (assuming one can write a wiggliness penalty down for the RCS basis in the required form) you'd use a penalised restricted cubic regression spline model.

Restricted Cubic Splines

A restricted cubic spline (or a natural spline) is a spline basis built from piecewise cubic polynomial functions that join smoothly at some pre-specified locations, or knots. What distinguishes a restricted cubic spline from a cubic spline is that additional constraints are imposed on the restricted version such that the spline is linear before the first knot and after the last knot. This is done to improve performance of the spline in the tails of $X$.

Model selection with an RCS typically involves choosing the number of knots and their location, with the former governing how wiggly or complex the resulting spline is. Unless some further steps are in place to regularize the estimated coefficients when model fitting, then the number of knots directly controls spline complexity.

This means that the user has some problems to overcome when estimating a model containing one or more RCS terms:

  1. How many knots to use?,
  2. Where to place those knots in the span of $X$?,
  3. How to compare models with different numbers of knots?

On their own, RCS terms require user intervention to solve these problems.

Penalized splines

Penalized regression splines (sensu Hodges) on their own tackle issue 3. only, but they allow for issue 1. to be circumvented. The idea here is that as well as the basis expansion of $X$, and for now let's just assume this is a cubic spline basis, you also create a wiggliness penalty matrix. Wiggliness is measured using some derivative of the estimated spline, with the typical derivative used being the second derivative, and the penalty itself represents the squared second derivative integrated over the range of $X$. This penalty can be written in quadratic form as

$$\boldsymbol{\beta}^{\mathsf{T}} \boldsymbol{S} \boldsymbol{\beta}$$

where $\boldsymbol{S}$ is a penalty matrix and $\boldsymbol{\beta}$ are the model coefficients. Then coefficient values are found to maximise the penalised log-likelihood $\mathcal{L}_p$ ceriterion

$$\mathcal{L}_p = \mathcal{L} - \lambda \boldsymbol{\beta}^{\mathsf{T}} \boldsymbol{S} \boldsymbol{\beta}$$

where $\mathcal{L}$ is the log-likelihood of the model and $\lambda$ is the smoothness parameter, which controls how strongly to penalize the wiggliness of the spline.

As the penalised log-likelihood can be evaluated in terms of the model coefficients, fitting this model effectively becomes a problem in finding an optimal value for $\lambda$ whilst updating the coefficients during the search for that optimal $\lambda$.

$\lambda$ can be chosen using cross-validation, generalised cross-validation(GCV), or marginal likelihood or restricted marginal likelihood criteria. The latter two effectively recast the spline model as a mixed effects model (the perfectly smooth parts of the basis become fixed effects and the wiggly parts of the basis are random effects, and the smoothness parameter is inversely related to the variance term for the random effects), which is what Hodges is considering in his book.

Why does this solve the problem of how many knots to use? Well, it only kind of does that. This solves the problem of not requiring a knot at every unique data point (a smoothing spline), but you still need to choose how many knots or basis functions to use. However, because the penalty shrinks the coefficients you can get away with choosing as large a basis dimension as you think is needed to contain either the true function or a close approximation to it, and then you let the penalty control how wiggly the estimated spline ultimately is, with the extra potential wiggliness available in the basis being removed or controlled by the penalty.

Comparison

Penalized (regression) splines and RCS are quite different concepts. There is nothing stopping you creating a RCS basis and an associated penalty in quadratic form and then estimating the spline coefficients using the ideas from the penalized regression spline model.

RCS is just one kind of basis you can use to create a spline basis, and penalized regression splines are one way to estimate a model containing one or more splines with associated wiggliness penalties.

Can we avoid issues 1., 2., and 3.?

Yes, to some extent, with a thin plate spline (TPS) basis. A TPS basis has as many basis functions as unique data values in $X$. What Wood (2003) showed was that you can create a Thin Plate Regression Spline (TPRS) basis uses an eigendecomposition of the the TPS basis functions, and retaining only the first $k$ largest say. You still have to specify $k$, the number of basis functions you want to use, but the choice is generally based on how wiggly you expect the fitted function to be and how much computational hit you are willing to take. There is no need to specify the knot locations either, and the penalty shrinks the coefficients so one avoids the model selection problem as you only have one penalized model not many unpenalized ones with differing numbers of knots.

P-splines

Just to make things more complicated, there is a type of spline basis known as a P-spline (Eilers & Marx, 1996)), where the $P$ often gets interpreted as "penalized". P-splines are a B-spline basis with a difference penalty applied directly to the model coefficients. In typical use the P-spline penalty penalizes the squared differences between adjacent model coefficients, which in turn penalises wiggliness. P-splines are very easy to set-up and result in a sparse penalty matrix which makes them very amenable to estimation of spline terms in MCMC based Bayesian models (Wood, 2017).

References

Eilers, P. H. C., and B. D. Marx. 1996. Flexible Smoothing with -splines and Penalties. Stat. Sci.

Wood, S. N. 2003. Thin plate regression splines. J. R. Stat. Soc. Series B Stat. Methodol. 65: 95–114. doi:10.1111/1467-9868.00374

Wood, S. N. 2017. Generalized Additive Models: An Introduction with R, Second Edition, CRC Press.