There is collinearity effect in linear regression methods. For example, this question is about collinear predictors in GLM.
But GAMs are nonlinear, do we need to check the collinearity of independent variables before using GAMs?
Solved – Does GAM (Generalized Additive Model) have collinearity problem
generalized linear modelgeneralized-additive-model
Related Solutions
The reason for the first error is that bio.percent_b500
doesn't have k
- 1 unique values. If you set k
to something lower for this spline then the model will fit. Why the 2-d thin-plate version works I think, IIRC, is because of the way it calculates a default for k
when not supplied and it must do this from the data. So for model1
, setting k = 9
works:
> model1 <- gam(honey.mean ~ s(week) + s(bio.percent_b500, k = 9), data = df)
In mgcv, a model with s(x1, x2)
includes the main effects and the interaction. Note the s()
assumes similar scaling in the variables, so I doubt you really want s()
- try te()
instead for a tensor-product smooth. If you want a decomposition into main effects and interaction then try:
model2 <- gam(honey.mean ~ ti(week) + ti(bio.percent_b500) +
ti(week, bio.percent_b500), data = df)
From the summary()
it suggests the interaction is not required:
> summary(model2)
Family: gaussian
Link function: identity
Formula:
honey.mean ~ ti(week) + ti(bio.percent_b500) + ti(week, bio.percent_b500)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.2910 0.5263 23.35 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
ti(week) 3.753 3.963 22.572 <2e-16 ***
ti(bio.percent_b500) 3.833 3.974 2.250 0.0461 *
ti(week,bio.percent_b500) 1.246 1.448 1.299 0.2036
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.25 Deviance explained = 27.3%
GCV = 84.697 Scale est. = 81.884 n = 296
And you get similar information from the generalised likelihood ratio test"
> model1 <- gam(honey.mean ~ ti(week) + ti(bio.percent_b500, k = 9), data = df)
> anova(model1, model2, test = "LRT")
Analysis of Deviance Table
Model 1: honey.mean ~ ti(week) + ti(bio.percent_b500, k = 9)
Model 2: honey.mean ~ ti(week) + ti(bio.percent_b500) + ti(week, bio.percent_b500)
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 285.65 23363
2 286.17 23433 -0.51811 -69.675 0.1831
Here ti()
terms are tensor-product interaction terms in which the main/marginal effects for other terms in the model are removed. You don't strictly need ti()
I think in model1
, te()
or even s()
should work, but the examples in mgcv all now use this form, so I'm going with that.
model3
doesn't make much sense to me, especially if you use tensor products; the s(week)
term is included in the s(week, bio.percent_b500)
, but as I mentioned bivariate s()
terms assume isotropy so this may have overly constrained the week
component thus leaving space for the s(week)
to come in and explain something. In general you shouldn't this is you get the bivariate term right. Whether you can get a true bivariate term right with your 500m variable isn't clear to me.
Questions:
Q1
I doubt you want an interaction without main/marginal effects. Your model2
includes both.
Q2
The model may be more complex in terms of the sets of functions you envisage but you need to consider the bases produced for each smooth term, plus mgcv uses splines with penalties which do smoothness selection and hence you could end up with a model that is at first glance more complex, but the smooth terms have been shrunken because of the penalties such that they use fewer degrees of freedom once fitted.
Q3
Technically, only model1
is really correctly specified. I don't think you want to assume isotropy for model2
. I think you'll run into fitting problems with model3
, model6
, and model7
in general; if you set up the bases for the tensor products right, you shouldn't need the separate s(week)
terms in those models.
Q1 What's the difference between models 3 and 4?
Model 3 is a purely additive model
$$y = \alpha + f_1(x) + f_2(w) + \varepsilon$$
so we have a constant $\alpha$ plus the smooth effect of $x$ plus the smooth effect of $w$.
Model 4 is a smooth interaction of two continuous variables
$$y = \alpha + f_1(x, w) + \varepsilon$$
In a practical sense, Model 3 says that no matter what the effect of $w$, the effect of $x$ on the response is the same; if we fix $x$ at some known value and vary $w$ over some range, the contributions from $f_1(x)$ to the fitted model remains the same. Verify this if you want by predict()
-ing from model 3 for a fixed value of x
and a few different values of w
and use the type = 'terms'
argument of the predict()
method. You'll see a constant contribution to the fitted/predicted values for s(x)
.
This is not the case of model 4; this model says that the smooth effect of $x$ varies smoothly with the value of $w$ and vice-versa.
Note that unless $x$ and $w$ are in the same units or we expect the same wiggliness in both variables, you should be using te()
to fit the interaction.
m4a <- gam(y ~ te(x, w), data = data.ex, method = 'REML')
pdata <- mutate(data.ex, Fittedm4a = predict(m4a))
ggplot(pdata, aes(x = x, y = y)) +
geom_point() +
geom_line(aes(y = Fittedm4a), col = 'red')
In one sense, model 4 is fitting
$$y = \alpha + f_1(x) + f_2(w) + f_3(x, w) + \varepsilon$$
where $f_3$ is the pure smooth interaction of the "main" smooth effects of $x$ and $w$, and which have been removed, for sake of identifiability, from the basis of $f_3$. You can get this model via
m4b <- gam(y ~ ti(x) + ti(w) + ti(x, w), data = data.ex, method = 'REML')
but note this estimates 4 smoothness parameters:
- the one associated with the main smooth effect of $x$
- the one associated with the main smooth effect of $w$
- the one associated with the marginal smooth of $x$ in the interaction tensor product smooth
- the one associated with the marginal smooth of $w$ in the interaction tensor product smooth
The te()
model contains just two smoothness parameters, one per marginal basis.
A fundamental problem with all these models is that the effect of $w$ is not strictly smooth; there's a discontinuity where the effect of $w$ falls to 0 (or 1 in w2
). This is showing up in your plots (and the one I show in detail here).
Q2 What, exactly, is "by" doing here?
by
variable smooths can do a number of different things depending on what you pass to the by
argument. In your examples the by
variable, $w$ is continuous. In that case what you get a varying coefficient model. This is a model in which the linear effect of $w$ varies smoothly with $x$. In equation terms this is what your, say model 5, is doing
$$y = \alpha + f_1(x)w + \varepsilon$$
If this isn't immediately clear (it wasn't to me when I first looked at these models) for some given values of $x$ we evaluate the smooth function at this value and this then become the equivalent of $\beta_1 w$; in other words, it is the linear effect of $w$ at the given value of $x$, and those linear effects vary smoothly with $x$. See section 7.5.3 in the second edition of Simon's book for a concrete example where the linear effect of a covariate varies a smooth function of space (lat and long).
Q3 Why would there be such a notable difference between Models 5 and 9?
The difference between models 5 and 9 I think is simply due to multiplying by 0 or multiplying by 1. With the former, the effect of the only term in the model $f_1(x)w$ is 0 because $f_1(x) \times 0 = 0$. In model 9 you have $f_1(x) \times 1 = f_1(x)$ in those areas where there is no contribution from the gaussians of $w$. As $f_1(x)$ is an ~ exponential function, you get this superimposed upon the overall effect of $w$.
In other words, model 5 contains zero trend everywhere $w$ is 0, but model 9 includes an ~ exponential trend everywhere $w$ is 0 (1), upon which the varying coefficient effect of $w$ is superimposed.
Best Answer
GAM models can be afflicted by concurvity (the extension of GLM collinearity to GAM models).
According to https://stat.ethz.ch/R-manual/R-devel/library/mgcv/html/concurvity.html:
The above link explains how you can compute three different measures of concurvity for a GAM model fitted with the mgcv package in R, all of which are bounded between 0 and 1 (with 0 indicating no concurvity).
Thus, you do have to check for the potential presence of concurvity in your GAM models by computing appropriate measures of concurvity and making sure they are not too high (i.e., not too close to 1). See also gam smoother vs parametric term (concurvity difference), https://jroy042.github.io/nonlinear/week3.html and https://eric-pedersen.github.io/mgcv-esa-workshop/slides/02-model_checking.html#/.