Solved – Interpretation of Tensor product splines with multiple terms and their standard errors

generalized-additive-modelmgcvsplinesstandard error

I am trying to understand the tensor product splines such as

library(mgcv)
vis.gam(gam(data=airquality, Ozone~s(Day, Month)), se = -1)
vis.gam(gam(data=airquality, Ozone~s(Day, Month, Temp)), se = -1)
vis.gam(gam(data=airquality, Ozone~s(Temp, Day, Month, Wind)), se = -1)

where visualised with generalised additive model such that

enter image description here

enter image description here

enter image description here

and lastly one standard errors such that

enter image description here

How to explain the tensor product splines? Why is the order important in forming the splines with s(...,...,...,...)?

Best Answer

These are not tensor product splines. What you are fitting are 2d and 3d thin plate splines. These kinds of smooths are generally inappropriate for the models you fit here because they assume isotropy; that a unit change in one marginal covariate is the same as a unit change in the other marginal covariate and that therefore a single smoothness penalty is appropriate.

Tensor product splines are produced by the te() function (or t2() for a different parameterisation).

Apart from the first vis.gam() call, you are trying to visualise the partial effect of two covariates conditional upon the others (the other covariates are generally held at some typical value, such as their mean). In the same way that including or removing a term in a simple regression model changes the estimated effect (the partial effect) of a covariate in both models (unless the covariates are orthogonal), changing what terms are in the GAM causes changes in the estimated partial effects of those covariates.

This is compounded here because you are removing or including terms from a single high-dimensional smooth, and furthermore, you are not changing the order of the terms in the model independently of which terms are in the model.

Compare

library('mgcv')
data(airquality)

m1 <- gam(Ozone ~ s(Day, Month, Temp), data=airquality, method = 'REML')
m2 <- gam(Ozone ~ s(Month, Day, Temp), data=airquality, method = 'REML')

> printCoefmat(summary(m1)$s.table)
                     edf Ref.df      F   p-value    
s(Day,Month,Temp) 13.495 16.379 9.9337 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> printCoefmat(summary(m2)$s.table)
                     edf Ref.df      F   p-value    
s(Month,Day,Temp) 13.494 16.379 9.9338 < 2.2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

There are some slight differences but fitted values and summaries compare to the the first few decimal places. These difference are likely due to the way the 3d thin plate spline basis is being set up — it is a low-rank thin plate spline.

If we switch to tensor product smooths then the two models are the same regardless of order (note I have to set k lower here as it applies to each marginal so we have k^3 terms here prior to application of identifiability constraints).

m1 <- gam(Ozone ~ te(Day, Month, Temp, k=4), data=airquality, method = 'REML')
m2 <- gam(Ozone ~ te(Month, Day, Temp, k=4), data=airquality, method = 'REML')

> all.equal(fitted(m1), fitted(m2))
[1] TRUE