The concurvity moves from the stated smooth terms to the parametric terms, which concurvity
groups in total under the para
column of the matrix or matrices returned.
Here's a modified example from ?concurvity
library("mgcv")
## simulate data with concurvity...
set.seed(8)
n<- 200
f2 <- function(x) 0.2 * x^11 * (10 * (1 - x))^6 + 10 *
(10 * x)^3 * (1 - x)^10
t <- sort(runif(n)) ## first covariate
## make covariate x a smooth function of t + noise...
x <- f2(t) + rnorm(n)*3
## simulate response dependent on t and x...
y <- sin(4*pi*t) + exp(x/20) + rnorm(n)*.3
## fit model...
b <- gam(y ~ s(t,k=15) + s(x,k=15), method="REML")
Now add a linear term and refit
x2 <- seq_len(n) + rnorm(n)*3
b2 <- update(b, . ~ . + x2)
Now look at the concurvity of the two models
## assess concurvity between each term and `rest of model'...
concurvity(b)
concurvity(b2)
These produce
> concurvity(b)
para s(t) s(x)
worst 1.06587e-24 0.60269087 0.6026909
observed 1.06587e-24 0.09576829 0.5728602
estimate 1.06587e-24 0.24513981 0.4659564
> concurvity(b2)
para s(t) s(x)
worst 0.9990068 0.9970541 0.6042295
observed 0.9990068 0.7866776 0.5733337
estimate 0.9990068 0.9111690 0.4668871
Note that x2
is essentially a noisy version of t
:
> cor(t, x2)
[1] 0.9975977
and hence the concurvity is gone up from essentially 0 in b
to almost 1 in b2
.
Now if we add x2
as a smooth function instead...
concurvity(update(b, . ~ . + s(x2)))
we see that the para
entries return to being very small and we get a measure for the spline term s(x2)
directly
> concurvity(update(b, . ~ . + s(x2)))
para s(t) s(x) s(x2)
worst 1.506201e-24 0.9977153 0.6264654 0.9976988
observed 1.506201e-24 0.9838018 0.5893737 0.9963857
estimate 1.506201e-24 0.9909506 0.4921592 0.9943990
This is just how the function works in terms of the parametric terms; the focus is on the smooth terms.
Note: you are specifying gamma
but fitting using REML. gamma
only affects GCV and UBRE/AIC methods of smoothness selection, so you can remove this argument as it is having zero effect on the model fits. From version 1.8-23 of mgcv, the gamma
argument no also affects models fitted using REML/ML, where smoothness parameters are selected BY REML/ML as if the sample size was $n/\gamma$ instead of $n$.
By default the tensor product constructor implementing a ti or te term will reparameterize the marginal smooths so that the coefficients are interpretable as values of the smooth at evenly spaced covariate values (see section 5.6.2. Wood, 2017, Generalized Additive Models: An introduction with R). This is done to improve the interpretation of the smoothing penalties. You can turn it off using ti(...,np=FALSE) -- see ?ti. The "cr" basis does not need to be re-parameterized, which is why you saw no difference in that case. Simon Wood
Best Answer
Q1
No, leave it in as an
s()
term; part of the wiggliness inrange
is taken up in theti(day, range)
term. In fact it would seem sensible to decide whetherti(day, range)
is needed and if it is, refit the model withte(day, range)
rather than the separates()
terms plus theti()
term, if only for the reduction in the number of smoothing parameters that need to be estimated (from 8 with your model to 4 for thete()
version - which are doubled from the 4 or 2 parameters I think because of the shrinkage thin-plate splines you are using).Q2
Values like this are common when you turn on shrinkage (either via shrinkage splines or via the double-penalty approach using
select = TRUE
. Both options allow shrinkage of the null space of the penalty matrix. This null space include the set of basis functions that are completely smooth. Without the shrinkage, the smoothness parameters would shrink the wiggly parts of the model until all that was left was the linear function. In that case the range space is shrunk by the smoothness parameter but not the null space.By adding an extra penalty, in your case by adding a small value to the zero eigenvalues of the penalty matrix, the null space can now also be shrunk via a second penalty and associated smoothness parameter. This allows for the linear part of the spline to be shrunk a little (or a lot) towards the model intercept. If you're familiar with the LASSO penalty you will notice a similarity; the shrunken linear function doesn't have it's full least-squares estimated effect but a somewhat smaller effect because of the shrinkage towards zero. The penalty is different in splines but the effect is similar. The less-than-1 EDF for the term just reflects this.
You can often find spline terms with EDF <1 that are not linear. In that case it seems that the null space (the linear, perfectly smooth bits) has been shrunk as well as the range space (the wiggly bits) but the smoothness parameter(s) for the wiggly bit allow some small amount of wiggliness and the shrinkage to the null space drags the EDF below 1. This is fine; the model is estimating a slightly wiggly function but uncertainty in that estimate probably means that a linear function will reside in the 95% confidence interval for the smooth.