Solved – Thin Plate Regression Splines mgcv

mgcvrregressionsplines

I am struggling on the understanding of thin plate regression splines. I already found a very helpful answer here in cross-validated:
smoothing methods for gam in mgcv package
but I still have some problems.

Here is the pdf of corresponding paper (Simon Wood, also the author of the mgcv-package in R):
https://pdfs.semanticscholar.org/f1d3/d313a723c9eaeef496244edcfefeae237feb.pdf

In the one dimensional case if I search for the minimization of:
$$\sum_{i=1}^n (y_i – f(x_i))^2 + \lambda \int_a^b(f''(x))^2\; dx
$$

Result is a cubic splines with knots at every observation. As I understand in Wood (2003) this a special case of a thin plate spline for 1 dimension. If the covariate is not longer one but more dimensional and the function which should be obtained becomes: $f(x_i, z_i, …)$ the result is the thin plate spline (for specific values for differentiation and dimension). Did I get that right? So you could say that thin plate splines are the multi-dimensional analogue to the cubic spline obtained in the one dimensional case?

Thin plate splines are therefore smoothing splines with a knot at each covariate value. To then obtain a low rank smoother Wood does an eigendecomposition and picks the first $k$ eigenvectors which contains most of the variance. My question is why it is useful to build low rank smoothers?

To reduce computational cost? But to do an eigendecomposition you still have to compute the full thin plate spline basis? Or is the reason that with using penalized least squares to get the estimated coefficients the matrix to inverse becomes then $k \times k$ and not longer $n \times n$?

Also in the case of using thin plate splines as penalized regression splines there is a penalty and therefore the smoothness of the function is mostly determined by the smoothing parameter (if $k$ is high enough) which would work for the full basis as well.

Or is the reason to reduce the number of parameter that have to be estimated?

Best Answer

The motivation for performing an eigendecomposition of the design matrix is indeed, as you mentioned, to reduce the computational cost of the algorithm. Fitting splines, particularly in the case where $d > 1$, is a very computationally intensive task - in the paper you cite, Wood mentions that all of the algorithms for $d > 1$ are of $O(n^3)$ complexity. Performing an eigendecomposition and selecting the top k eigenvalues not only decreases the computational cost from $O(n^3)$ to $O(k^3)$, but also decreases the memory overhead, since we don't have to keep as many elements of the design matrix in memory. This is especially valuable when working with larger datasets.

Related Question