The k
argument effectively sets up the dimensionality of the smoothing matrix for each term. gam()
is using a GCV or UBRE score to select an optimal amount of smoothness, but it can only work within the dimensionality of the smoothing matrix. By default, te()
smooths have k = 5^2
for 2d surfaces. I forget what it is for s()
so check the documents. The current advice from Simon Wood, author of mgcv, is that if the degree of smoothness selected by the model is at or close to the limit of the dimensionality imposed by the value used for k
, you should increase k
and refit the model to see if a more complex model is selected from the higher dimensional smoothing matrix.
However, I don't know how locfit works, but you do need to have something the stops you from fitting too complex a surface (GCV and UBRE, or (RE)ML if you choose to use them [you can't as you set scale = -1
], are trying to do just that), that is not supported by the data. In other words, you could fit very local features of the data but are you fitting the noise in the sample of data you collected or are you fitting the mean of the probability distribution? gam()
may be telling you something about what can be estimated from your data, assuming that you've sorted out the basis dimensionality (above).
Another thing to look at is that the smoothers you are currently using are global in the sense that the smoothness selected is applied over the entire range of the smooth. Adaptive smoothers can spend the allotted smoothness "allowance" in parts of the data where the response is changing rapidly. gam()
has capabilities for using adaptive smoothers.
See ?smooth.terms
and ?adaptive.smooth
to see what can be fitted using gam()
. te()
can combine most if not all of these smoothers (check the docs for which can and can't be included in tensor products) so you could use an adaptive smoother basis to try to capture the finer local scale in the parts of the data where the response is varying quickly.
I should add, that you can get R to estimate a model with a fixed set of degrees of freedom used by a smooth term, using the fx = TRUE
argument to s()
and te()
. Basically, set k to be what you want and fx = TRUE
and gam()
will just fit a regression spline of fixed degrees of freedom not a penalised regression spline.
The reason for the first error is that bio.percent_b500
doesn't have k
- 1 unique values. If you set k
to something lower for this spline then the model will fit. Why the 2-d thin-plate version works I think, IIRC, is because of the way it calculates a default for k
when not supplied and it must do this from the data. So for model1
, setting k = 9
works:
> model1 <- gam(honey.mean ~ s(week) + s(bio.percent_b500, k = 9), data = df)
In mgcv, a model with s(x1, x2)
includes the main effects and the interaction. Note the s()
assumes similar scaling in the variables, so I doubt you really want s()
- try te()
instead for a tensor-product smooth. If you want a decomposition into main effects and interaction then try:
model2 <- gam(honey.mean ~ ti(week) + ti(bio.percent_b500) +
ti(week, bio.percent_b500), data = df)
From the summary()
it suggests the interaction is not required:
> summary(model2)
Family: gaussian
Link function: identity
Formula:
honey.mean ~ ti(week) + ti(bio.percent_b500) + ti(week, bio.percent_b500)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.2910 0.5263 23.35 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
ti(week) 3.753 3.963 22.572 <2e-16 ***
ti(bio.percent_b500) 3.833 3.974 2.250 0.0461 *
ti(week,bio.percent_b500) 1.246 1.448 1.299 0.2036
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.25 Deviance explained = 27.3%
GCV = 84.697 Scale est. = 81.884 n = 296
And you get similar information from the generalised likelihood ratio test"
> model1 <- gam(honey.mean ~ ti(week) + ti(bio.percent_b500, k = 9), data = df)
> anova(model1, model2, test = "LRT")
Analysis of Deviance Table
Model 1: honey.mean ~ ti(week) + ti(bio.percent_b500, k = 9)
Model 2: honey.mean ~ ti(week) + ti(bio.percent_b500) + ti(week, bio.percent_b500)
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 285.65 23363
2 286.17 23433 -0.51811 -69.675 0.1831
Here ti()
terms are tensor-product interaction terms in which the main/marginal effects for other terms in the model are removed. You don't strictly need ti()
I think in model1
, te()
or even s()
should work, but the examples in mgcv all now use this form, so I'm going with that.
model3
doesn't make much sense to me, especially if you use tensor products; the s(week)
term is included in the s(week, bio.percent_b500)
, but as I mentioned bivariate s()
terms assume isotropy so this may have overly constrained the week
component thus leaving space for the s(week)
to come in and explain something. In general you shouldn't this is you get the bivariate term right. Whether you can get a true bivariate term right with your 500m variable isn't clear to me.
Questions:
Q1
I doubt you want an interaction without main/marginal effects. Your model2
includes both.
Q2
The model may be more complex in terms of the sets of functions you envisage but you need to consider the bases produced for each smooth term, plus mgcv uses splines with penalties which do smoothness selection and hence you could end up with a model that is at first glance more complex, but the smooth terms have been shrunken because of the penalties such that they use fewer degrees of freedom once fitted.
Q3
Technically, only model1
is really correctly specified. I don't think you want to assume isotropy for model2
. I think you'll run into fitting problems with model3
, model6
, and model7
in general; if you set up the bases for the tensor products right, you shouldn't need the separate s(week)
terms in those models.
Best Answer
There's a problem with multivariate smoothing in high dimensions. Conceptually, a multivariate smooth predicts $f(x_0)$ using data $(x,y)$ where $x$ is 'close' to $x_0$ in every dimension. An additive model uses data where $x$ is close to $x_0$ in at least one dimension.
When you have $n$ points in $d$ dimensional space, for something like $d\gg\log n$, there aren't any data points $(x,y)$ with $x$ close to $x_0$ in every dimension. To get your multivariate smoother to work, you need to increase the smoothing window -- potentially by a lot.
So, one tradeoff is between an additive model with a relatively small smoothing window and a multivariate smoother with a larger smoothing window. (You can compromise, and take an additive model with low-dimensional rather than one-dimensional smoothing components, but it is a compromise.)
Another way of looking at the same issue is that an additive model has a much smaller space of possible fits (once you fix the smoothing window size). It's less flexible. This is bad if the true response function isn't close to any of the possibilities in the additive model, because your fit will be bad. It's good if the true response function is close to some of the possibilities in the additive model, because you'll get accurate prediction with less overfitting.