Solved – Thin Plate Regression Splines mgcv

mgcvrregressionsplines

I am struggling on the understanding of thin plate regression splines. I already found a very helpful answer here in cross-validated:
smoothing methods for gam in mgcv package
but I still have some problems.

Here is the pdf of corresponding paper (Simon Wood, also the author of the mgcv-package in R):
https://pdfs.semanticscholar.org/f1d3/d313a723c9eaeef496244edcfefeae237feb.pdf

In the one dimensional case if I search for the minimization of:
$$\sum_{i=1}^n (y_i – f(x_i))^2 + \lambda \int_a^b(f''(x))^2\; dx
$$
Result is a cubic splines with knots at every observation. As I understand in Wood (2003) this a special case of a thin plate spline for 1 dimension. If the covariate is not longer one but more dimensional and the function which should be obtained becomes: $f(x_i, z_i, …)$ the result is the thin plate spline (for specific values for differentiation and dimension). Did I get that right? So you could say that thin plate splines are the multi-dimensional analogue to the cubic spline obtained in the one dimensional case?

Thin plate splines are therefore smoothing splines with a knot at each covariate value. To then obtain a low rank smoother Wood does an eigendecomposition and picks the first $k$ eigenvectors which contains most of the variance. My question is why it is useful to build low rank smoothers?

To reduce computational cost? But to do an eigendecomposition you still have to compute the full thin plate spline basis? Or is the reason that with using penalized least squares to get the estimated coefficients the matrix to inverse becomes then $k \times k$ and not longer $n \times n$?

Also in the case of using thin plate splines as penalized regression splines there is a penalty and therefore the smoothness of the function is mostly determined by the smoothing parameter (if $k$ is high enough) which would work for the full basis as well.

Or is the reason to reduce the number of parameter that have to be estimated?

Best Answer

The motivation for performing an eigendecomposition of the design matrix is indeed, as you mentioned, to reduce the computational cost of the algorithm. Fitting splines, particularly in the case where $d > 1$, is a very computationally intensive task - in the paper you cite, Wood mentions that all of the algorithms for $d > 1$ are of $O(n^3)$ complexity. Performing an eigendecomposition and selecting the top k eigenvalues not only decreases the computational cost from $O(n^3)$ to $O(k^3)$, but also decreases the memory overhead, since we don't have to keep as many elements of the design matrix in memory. This is especially valuable when working with larger datasets.

Related Solutions

Solved – Use coefficients of thin plate regression splines in a clustering method

If I understand correctly, I think you want the coefficients from the $gam component:

> coef(test$gam)
(Intercept)     s(x1).1     s(x1).2     s(x1).3     s(x1).4     s(x1).5 
 21.8323526   9.2169405  15.7504889  -3.4709907  16.9314057 -19.4909343 
    s(x1).6     s(x1).7     s(x1).8     s(x1).9     s(x2).1     s(x2).2 
  1.1124505  -3.3807996  21.7637766 -23.5791595   3.2303904  -3.0366406 
    s(x2).3     s(x2).4     s(x2).5     s(x2).6     s(x2).7     s(x2).8 
 -2.0725621  -0.6642467   0.7347857   1.7232242  -0.5078875  -7.7776700 
    s(x2).9 
-12.0056347

Update 1: To get at the basis functions we can use predict(...., type = "lpmatrix") to get $Xp$ the smoothing matrix:

Xp <- predict(test$gam, type = "lpmatrix")

The fitted spline (e.g. for s(x1)) can be recovered then using:

plot(Xp[,2:10] %*% coef(test$gam)[2:10], type = "l")

You can plot this ($Xp$) and see that it is similar to um[[1]]$X

layout(matrix(1:2, ncol = 2))
matplot(um[[1]]$X, type = "l")
matplot(Xp[,1:10], type = "l")
layout(1)

I pondered why these are not exactly the same. ~~Is it because the original basis functions have been subject to the penalised regression during fitting???~~

Update 2: You can make them the same by including the identifiability constraints into your basis functions in um:

um2 <- smoothCon(s(x1), data=data.frame(x1=x1), knots=NULL, 
                 absorb.cons=TRUE)
layout(matrix(1:2, ncol = 2))
matplot(um2[[1]]$X, type = "l", main = "smoothCon()")
matplot(Xp[,2:10], type = "l", main = "Xp matrix") ##!##
layout(1)

Note I have not got the intercept in the line marked ##!##.

You ought to be able to get $Xp$ directly from function PredictMat(), which is documented on same page as smoothCon().

Solved – Smoothing methods for gam in mgcv package

mgcv uses a thin plate spline basis as the default basis for it's smooth terms. To be honest it likely makes little difference in many applications which of these you choose, though in some situations or with very large data set sizes, other basis types might be used to good effect. Thin plate splines tend to have better RMSE performance than the other three you mention but are more computationally expensive to set up. Unless you have a reason to use the P or B spline bases, use thin plate splines unless you have a lot of data and if you have a lot of data consider the cubic spline option.

k doesn't set the number of knots, at least not in the default thin plate spline basis. What k does is to set the dimensionality of the basis expansion; you'll end up with k - 1 basis functions. In mgcv Simon Wood does a trick to reduce the rank of basis dimension. IIRC, in the usual thin plate spline basis there is a knot at each data location, but this is wasteful as once you've set up this large basis you end up using far fewer degrees of freedom in the fitted function. What Simon does is to eigen decompose the matrix of basis functions and choose the eigenvectors of the decomposition corresponding to the k - 1 largest eigenvalues. This has the effect of concentrating the main wiggliness "information" of the full basis in a reduced rank form.

The choice of k is important and the default is arbitrary and something you want to check (see gam.check()), but the critical observation is that you want to set k to be large enough to contain the envisioned dimensionality of the underlying function you are trying to recover from the data. In practice, one tends to fit with a modest k given the data set size and then use gam.check() on the resulting model to check if k was large enough. If it wasn't, increase k and refit. Rinse and repeat...

You are most likely going to want to fit the model using REML (or ML) smoothness selection via method = "REML" or method = "ML": this treats the model as a mixed effects one with the wiggly parts of the spline bases being treated as special random effects terms. Simon Wood has shown that REML (or ML) selection performs better than GCV, which can undersmooth in situations where the objective function is flat around the optimal smoothness parameter value.

The ridge penalty mentioned by @generic_user is taken care of for you, so you can ignore this part of setting up the model.

Best Answer

Related Solutions

Solved – Use coefficients of thin plate regression splines in a clustering method

Solved – Smoothing methods for gam in mgcv package

Related Question