Spline Basis Dimension – Determining Using Wood’s Statistical Test

basis functiongeneralized-additive-modelreferencessmoothingsplines

In Simon Wood's book Generalized Additive Models (2nd ed.) on page 243, he describes the following procedure for checking that the basis dimension is too small:

Fortunately informal checking that the basis dimension for a particular term is appropriate is quite easy. Suppose that a smooth term is a function of a covariate $x$ (which may be vector in general). Let $\text{nei}(i)$ denote the set of indices of the $m$ nearest neighbours of $x_i$ according to an appropriate measure $\left\|x_i – x_j\right\|$, and compute the mean absolute or squared difference between the deviance residual $\epsilon_i$ and $\left\{\epsilon_j : j \in \text{nei}(i) \right\}$. Average this difference over $i$ to get a single measure $\Delta$ of the difference between residuals and their neighbours. Now repeat the calculation several hundred times for randomly reshuffled residuals. If there is no residual pattern with respect to co variate $x$ then $\Delta$ should look like an ordinary observation from the distribution of $\Delta$ under random re-shuffling, but if the original $\Delta$ is unusally small this means that residuals appear positively correlated with their neighbours, suggesting that we may have under-smoothed. IIn that case if the EDF is also close to the basis dimension then it may be appropriate to increase the basis dimension. Note, however, that un-modelled residual auto-correlation and an incorrectly specified mean variance relationship can both lead to the same result, even when the basis dimension is perfectly reasonable. This residual randomization test is easily automated and computationally efficient (given an efficient nearest neighbour algorithm). It is implemented by default in gam.check in mgcv.

I have been using this procedure in mgcv and so far have found it unhelpful. It always seems to reject the null hypothesis, suggesting the basis dimension is too small, and increasing the basis dimension to large numbers leads to rough calculations, minimal change in the resulting fit, and this procedure still rejecting the null hypothesis. Eventually I am including too many basis functions and gam in mgcv says that it cannot be done due to too little data.

Is there any discussion of this procedure that goes beyond a paragraph in Wood's (excellent) book? Should this procedure just not be used? Is my data just insufficient for this type of modeling (I cannot share it or really talk about it) if this procedure is failing?

Best Answer

Generally, for models that are imperfect, which if one takes George Box's statement literally; "Essentially, all models are wrong but some are useful," we would then naturally assume that residuals are, as a rule, structured. That, in the 2D case of fitting a smooth curve to well behaved if noisy data means that the resulting independent variable ranked residuals have a significant polynomial that fits those residuals. We can borrow a term from imaging and call that model misregistration. If we then randomize the independent variable ranking of those residuals, we would decrease the goodness of fit of the polynomial structure discovered prior to randomization, which means that we will then be borrowing from the model's misregistration and converting that into less structured noise. Sometimes, pictures help. So, let's take a pretty good model enter image description here and show that the residuals are structured in a wavy fashion enter image description here

Now the difference between a very good model, and one that is not so good is that the relative misregistration is of lesser magnitude compared to the true noise in the data and greater for the not-so-good model. So, to answer the question, it seems to me that the only way to be able to randomize residuals and obtain an insignificant result from the test described, would be to have a very very good model to begin with. Now, I apologize if the description above looks a bit like "hand-waving," to borrow a term favored by physicists, but all I am trying for here is to outline what I think may be relevant, and is certainly not offered as a definitive proof.