Solved – GAM selection when both smooth and parametric terms are present

generalized-additive-modelmgcvmodel selection

I'm fitting GAMs to avian survey data and have a mix of smooth (thin plate regression splines) and parametric terms in my models. I know about the integrated term selection available in mgcv via select = TRUE or bs = 'ts', but the only examples i can find of this approach is when all terms in the model were smooths. As far as i can tell, this extra penalty approach does not do anything with parametric terms, and so this seems like not the right approach when there is a mix of terms present (since parametric terms will be inherently favored due to their lack of penalty). At the same time, the reverse stepwise approach via estimated p values also seems a bit dicey, cause again, from my reading (eg. ANOVA table (and its interpretation) for a single GAM model), the estimation of the p values is not equivalent for smooths and parametric terms.

Any advice here?

Best Answer

You could do what you want for linear terms using the paraPen argument to gam(), which allows penalties on parametric terms.

However, why not treat the linear terms as low-degree smooths (say k = 3) and let the double penalty work on it too?

For the categorical terms, I'd just leave them alone; I'm not sure it is possible to apply a group penalty to categories using paraPen. For something like year, it is highly unlikely that it will have a zero effect (all years exactly the same). I'd be inclined to either:

  1. treat year as categorical and just leave it alone penalty-wise, so you control for between year differences in the expectation of the response, or
  2. if you have enough years and you might expect a smooth trend in the data, treat it as smooth s(year).