I have a gam model that is:
gam=gam(sv~s(day,bs="tp")+s(range,bs="tp")+s(time,bs="cc"),data=train.all,gamma=1.4,method="REML")
the s(range)
produces an e.d.f of 1, so I made the model:
gam1=gam(sv~s(day,bs="tp")+range+s(time,bs="cc"),data=train.all,gamma=1.4,method="REML")
There is very high concurvity (~0.85) between day and range in the first model (gam), but that goes away in the gam1 model. I am wondering why that is if s(range)
is essentially the same as the parametric form of range. Is the concurvity/collinearity (not sure what to call it between a smoother and parametric term) still there, but simply not calculated by mgcv when it is a parametric term? Or are any co-dependence effects truly removed by simply changing "range" to its parametric form?
Best Answer
The concurvity moves from the stated smooth terms to the parametric terms, which
concurvity
groups in total under thepara
column of the matrix or matrices returned.Here's a modified example from
?concurvity
Now add a linear term and refit
Now look at the concurvity of the two models
These produce
Note that
x2
is essentially a noisy version oft
:and hence the concurvity is gone up from essentially 0 in
b
to almost 1 inb2
.Now if we add
x2
as a smooth function instead...we see that the
para
entries return to being very small and we get a measure for the spline terms(x2)
directlyThis is just how the function works in terms of the parametric terms; the focus is on the smooth terms.
Note:
you are specifyingFrom version 1.8-23 of mgcv, thegamma
but fitting using REML.gamma
only affects GCV and UBRE/AIC methods of smoothness selection, so you can remove this argument as it is having zero effect on the model fits.gamma
argument no also affects models fitted using REML/ML, where smoothness parameters are selected BY REML/ML as if the sample size was $n/\gamma$ instead of $n$.