The `k`

argument effectively sets up the dimensionality of the smoothing matrix for each term. `gam()`

is using a GCV or UBRE score to select an optimal amount of smoothness, but it can only work within the dimensionality of the smoothing matrix. By default, `te()`

smooths have `k = 5^2`

for 2d surfaces. I forget what it is for `s()`

so check the documents. The current advice from Simon Wood, author of **mgcv**, is that if the degree of smoothness selected by the model is at or close to the limit of the dimensionality imposed by the value used for `k`

, you should increase `k`

and refit the model to see if a more complex model is selected from the higher dimensional smoothing matrix.

However, I don't know how **locfit** works, but you do need to have something the stops you from fitting too complex a surface (GCV and UBRE, or (RE)ML if you choose to use them [you can't as you set `scale = -1`

], are trying to do just that), that is not supported by the data. In other words, you could fit very local features of the data but are you fitting the noise in the sample of data you collected or are you fitting the mean of the probability distribution? `gam()`

*may* be telling you something about what *can* be estimated from your data, assuming that you've sorted out the basis dimensionality (above).

Another thing to look at is that the smoothers you are currently using are global in the sense that the smoothness selected is applied over the entire range of the smooth. Adaptive smoothers can spend the allotted smoothness "allowance" in parts of the data where the response is changing rapidly. `gam()`

has capabilities for using adaptive smoothers.

See `?smooth.terms`

and `?adaptive.smooth`

to see what can be fitted using `gam()`

. `te()`

can combine most if not all of these smoothers (check the docs for which can and can't be included in tensor products) so you could use an adaptive smoother basis to try to capture the finer local scale in the parts of the data where the response is varying quickly.

I should add, that you can get R to estimate a model with a fixed set of degrees of freedom used by a smooth term, using the `fx = TRUE`

argument to `s()`

and `te()`

. Basically, set k to be what you want and `fx = TRUE`

and `gam()`

will just fit a regression spline of fixed degrees of freedom not a penalised regression spline.

The reason for the first error is that `bio.percent_b500`

doesn't have `k`

- 1 unique values. If you set `k`

to something lower for this spline then the model will fit. Why the 2-d thin-plate version works I think, IIRC, is because of the way it calculates a default for `k`

when not supplied and it must do this from the data. So for `model1`

, setting `k = 9`

works:

```
> model1 <- gam(honey.mean ~ s(week) + s(bio.percent_b500, k = 9), data = df)
```

In mgcv, a model with `s(x1, x2)`

*includes* the main effects *and* the interaction. Note the `s()`

assumes similar scaling in the variables, so I doubt you really want `s()`

- try `te()`

instead for a tensor-product smooth. If you want a decomposition into main effects and interaction then try:

```
model2 <- gam(honey.mean ~ ti(week) + ti(bio.percent_b500) +
ti(week, bio.percent_b500), data = df)
```

From the `summary()`

it suggests the interaction is not required:

```
> summary(model2)
Family: gaussian
Link function: identity
Formula:
honey.mean ~ ti(week) + ti(bio.percent_b500) + ti(week, bio.percent_b500)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.2910 0.5263 23.35 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Approximate significance of smooth terms:
edf Ref.df F p-value
ti(week) 3.753 3.963 22.572 <2e-16 ***
ti(bio.percent_b500) 3.833 3.974 2.250 0.0461 *
ti(week,bio.percent_b500) 1.246 1.448 1.299 0.2036
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
R-sq.(adj) = 0.25 Deviance explained = 27.3%
GCV = 84.697 Scale est. = 81.884 n = 296
```

And you get similar information from the generalised likelihood ratio test"

```
> model1 <- gam(honey.mean ~ ti(week) + ti(bio.percent_b500, k = 9), data = df)
> anova(model1, model2, test = "LRT")
Analysis of Deviance Table
Model 1: honey.mean ~ ti(week) + ti(bio.percent_b500, k = 9)
Model 2: honey.mean ~ ti(week) + ti(bio.percent_b500) + ti(week, bio.percent_b500)
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 285.65 23363
2 286.17 23433 -0.51811 -69.675 0.1831
```

Here `ti()`

terms are tensor-product interaction terms in which the main/marginal effects *for other terms in the model* are removed. You don't strictly need `ti()`

I think in `model1`

, `te()`

or even `s()`

should work, but the examples in **mgcv** all now use this form, so I'm going with that.

`model3`

doesn't make much sense to me, especially if you use tensor products; the `s(week)`

term is included in the `s(week, bio.percent_b500)`

, but as I mentioned bivariate `s()`

terms assume isotropy so this may have overly constrained the `week`

component thus leaving space for the `s(week)`

to come in and explain something. In general you shouldn't this is you get the bivariate term right. Whether you can get a true bivariate term right with your 500m variable isn't clear to me.

## Questions:

### Q1

I doubt you want an interaction without main/marginal effects. Your `model2`

includes both.

### Q2

The model may be more complex in terms of the sets of functions you envisage but you need to consider the bases produced for each smooth term, plus **mgcv** uses splines with penalties which do smoothness selection and hence you could end up with a model that is at first glance more complex, but the smooth terms have been shrunken because of the penalties such that they use fewer degrees of freedom once fitted.

### Q3

Technically, only `model1`

is really correctly specified. I don't think you want to assume isotropy for `model2`

. I think you'll run into fitting problems with `model3`

, `model6`

, and `model7`

in general; if you set up the bases for the tensor products right, you shouldn't need the separate `s(week)`

terms in those models.

## Best Answer

There's a problem with multivariate smoothing in high dimensions. Conceptually, a multivariate smooth predicts $f(x_0)$ using data $(x,y)$ where $x$ is 'close' to $x_0$ in

everydimension. An additive model uses data where $x$ is close to $x_0$ inat least onedimension.When you have $n$ points in $d$ dimensional space, for something like $d\gg\log n$, there

aren't anydata points $(x,y)$ with $x$ close to $x_0$ in every dimension. To get your multivariate smoother to work, you need to increase the smoothing window -- potentially by a lot.So, one tradeoff is between an additive model with a relatively small smoothing window and a multivariate smoother with a larger smoothing window. (You can compromise, and take an additive model with low-dimensional rather than one-dimensional smoothing components, but it is a compromise.)

Another way of looking at the same issue is that an additive model has a much smaller space of possible fits (once you fix the smoothing window size). It's less flexible. This is bad if the true response function isn't close to any of the possibilities in the additive model, because your fit will be bad. It's good if the true response function

isclose to some of the possibilities in the additive model, because you'll get accurate prediction with less overfitting.