Spline Basis Dimension – Determining Using Wood’s Statistical Test

basis functiongeneralized-additive-modelreferencessmoothingsplines

In Simon Wood's book Generalized Additive Models (2nd ed.) on page 243, he describes the following procedure for checking that the basis dimension is too small:

Fortunately informal checking that the basis dimension for a particular term is appropriate is quite easy. Suppose that a smooth term is a function of a covariate $x$ (which may be vector in general). Let $\text{nei}(i)$ denote the set of indices of the $m$ nearest neighbours of $x_i$ according to an appropriate measure $\left\|x_i – x_j\right\|$, and compute the mean absolute or squared difference between the deviance residual $\epsilon_i$ and $\left\{\epsilon_j : j \in \text{nei}(i) \right\}$. Average this difference over $i$ to get a single measure $\Delta$ of the difference between residuals and their neighbours. Now repeat the calculation several hundred times for randomly reshuffled residuals. If there is no residual pattern with respect to co variate $x$ then $\Delta$ should look like an ordinary observation from the distribution of $\Delta$ under random re-shuffling, but if the original $\Delta$ is unusally small this means that residuals appear positively correlated with their neighbours, suggesting that we may have under-smoothed. IIn that case if the EDF is also close to the basis dimension then it may be appropriate to increase the basis dimension. Note, however, that un-modelled residual auto-correlation and an incorrectly specified mean variance relationship can both lead to the same result, even when the basis dimension is perfectly reasonable. This residual randomization test is easily automated and computationally efficient (given an efficient nearest neighbour algorithm). It is implemented by default in gam.check in mgcv.

I have been using this procedure in mgcv and so far have found it unhelpful. It always seems to reject the null hypothesis, suggesting the basis dimension is too small, and increasing the basis dimension to large numbers leads to rough calculations, minimal change in the resulting fit, and this procedure still rejecting the null hypothesis. Eventually I am including too many basis functions and gam in mgcv says that it cannot be done due to too little data.

Is there any discussion of this procedure that goes beyond a paragraph in Wood's (excellent) book? Should this procedure just not be used? Is my data just insufficient for this type of modeling (I cannot share it or really talk about it) if this procedure is failing?

Best Answer

Generally, for models that are imperfect, which if one takes George Box's statement literally; "Essentially, all models are wrong but some are useful," we would then naturally assume that residuals are, as a rule, structured. That, in the 2D case of fitting a smooth curve to well behaved if noisy data means that the resulting independent variable ranked residuals have a significant polynomial that fits those residuals. We can borrow a term from imaging and call that model misregistration. If we then randomize the independent variable ranking of those residuals, we would decrease the goodness of fit of the polynomial structure discovered prior to randomization, which means that we will then be borrowing from the model's misregistration and converting that into less structured noise. Sometimes, pictures help. So, let's take a pretty good model and show that the residuals are structured in a wavy fashion

Now the difference between a very good model, and one that is not so good is that the relative misregistration is of lesser magnitude compared to the true noise in the data and greater for the not-so-good model. So, to answer the question, it seems to me that the only way to be able to randomize residuals and obtain an insignificant result from the test described, would be to have a very very good model to begin with. Now, I apologize if the description above looks a bit like "hand-waving," to borrow a term favored by physicists, but all I am trying for here is to outline what I think may be relevant, and is certainly not offered as a definitive proof.

Related Solutions

Solved – Spline – basis functions

This looks like a truncated power basis. The answer is b) although $h_5(X)$ will only be non-zero if $X$ is greater than $\xi_1$ and similarly for $h_6(X)$ and $\xi_2$

Splines – Is Spline Basis Orthogonal?

Computationally, sometimes; conceptually, rarely. (This started as comment...)

As already presented here (upvote it if you don't have already) when we use a spline in the context a generalised additive model as soon as the spline basis is created, fitting reverts to standard GLM modelling basis coefficients for each separate basis function. This insight important because we can generalise it further.

Let's say we have a B-spline that is very constrained. Something like an order 1 B-spline so we can see the knot locations exactly:

set.seed(123)
myX =  sort(runif(1000, max = 10))
myKnots = c(1,3)
Bmatrix <- bs(x = myX, degree = 1, knots = myKnots, intercept = FALSE)
matplot( myX, Bmatrix, type = "l");

This is a trivial B-spline basis $B$ that is clearly non-orthogonal (just do crossprod(Bmatrix) to check the inner products). So, B-splines bases are non-orthogonal by construction conceptually. An orthogonal series method would represent the data with respect to a series of orthogonal basis functions, like sines and cosines (eg. Fourier basis). Notably, an orthogonal method would allows us to select only the "low frequency" terms for further analysis. This brings to the computational part.

Because the fitting of a spline is an expensive process we try to simplify the fitting procedure by employing low-rank approximations. An obvious case of these are the thin plate regression splines used by default in the s function from mgcv::gam where the "proper" thin plate spline would be very expensive computationally (see ?smooth.construct.tp.smooth.spec). We start with the full thin plate spline and then truncate this basis in an optimal manner, dictated by the truncated eigen-decomposition of that basis. In that sense, computationally, yes, we will have an orthogonal basis for our spline basis even when the basis itself is not orthogonal. The spline is the "smoothest" function passing near our sampled values $X$. As now the basis of spline provides an equivalent representation of our $X$ in a space spanned by the spline basis $B$, further transforming that basis $B$ to another equivalent basis $Q$ does not alter our original results.

Going back to our trivial example, we can get the equivalent orthogonal basis $Q$ through SVD and then use it to get the equivalent results (depending on the order of the approximation). For example:

svdB = svd(t(Bmatrix));
Q = svdB$v;

Working now with this new system $Q$ is more desirable than with the original system $B$ because numerically $Q$ is far more stable (OK, $B$ is well-behaved here). Base R tries to also exploit these orthogonality properties. If we use poly by default we get the equivalent orthogonal polynomials rather than the raw polynomials of our predictor (argument raw).

Best Answer

Related Solutions

Solved – Spline – basis functions

Splines – Is Spline Basis Orthogonal?

Related Question