Generalized Additive Model – How to Factor Smooth and Main Effects for Both Variables?

generalized-additive-modelinteractionmgcvnonlinear regressionr

I am working with data on cuteness judgements and size judgments for differently named fantasy creatures. The interesting part of the different names are the vowels contained in them.

The question I am fitting a GAM for is: Do the effects of cuteness and vowel interact when it comes to size judgements?

Thus far, my model looks like this:

mdl1 <- gam(size ~ 
                s(cuteness, bs = "bs", by = vowel, k = 5),
                data = data)

The interaction of cuteness and vowel reaches significance for some of the vowels (which is fine).

Now a reviewer asked to modify the model so that not only the interaction effects of cuteness and vowel are contained but also the individual main effects of both cuteness and vowel, e.g. like this:

mdl2 <- gam(size ~ 
                s(cuteness, bs = "bs", by = vowel, k = 5) +
                cuteness +
                vowel,
                data = data)

As I have not worked with interactions in GAMs before, I am at a loss when it comes to deciding whether such a model structure is valid / makes sense (from a statistical point of view). Hence, my question: Is such a model structure valid, does it make sense?

Best Answer

Your original model was incorrectly specified; it should have been

mdl1 <- gam(size ~ vowel +
                s(cuteness, bs = "bs", by = vowel, k = 5),
                data = data, method = "REML", family = ...)

Where beyond some technical things — method and choosing an appropriate family for size (is it really conditionally Gaussian? 0 or negative size values don't make immediate sense to me and if so that might preclude using family = gaussian()) — note that I have included the mean response for each vowel through the parametric factor term. You could replace this with s(vowel, bs = "re") instead.

With that specification you have the main effect for the vowel means but there is no average effect for cuteness, which is what I presume the reviewer means by "main effect". To specify that model, I would suggest using the new-ish "sz" basis:

mdl1 <- gam(size ~
                s(cuteness, bs = "bs", k = 5) +
                s(cuteness, vowel, bs = "sz", k = 5,
                  xt = list(bs = "bs")),
                data = data, method = "REML", family = ...)

With this basis, you don't need the explicit parametric term for the group means; those are included in the "sz" basis, as deviations from the overall mean of the response (the model constant / Intercept term).

If you can bear the setup cost, use thin-plate regression splines (TPRS).

These splines are optimal in an asymptotic MSE sense, but require one basis function per observation. What Simon does in mgcv is generate a low-rank version of the standard TPRS by taking the full TPRS basis and subjecting it to an eigendecomposition. This creates a new basis where the first k basis function in the new space retain most of the signal in the original basis, but in many fewer basis functions. This is how mgcv manages to get a TPRS that uses only a specified number of basis functions rather than one per observation. This eigendecomposition preserves much of the optimality of the classic TPRS basis, but at considerable computational effort for large data sets.

If you can't bear the setup cost of TPRS, use cubic regression splines (CRS)

This is a quick basis to generate and hence is suited to problems with a lot of data. It is knot-based however, so to some extent the user now needs to choose where those knots should be placed. For most problems there is little to be gained by going beyond the default knot placement (at the boundary of the data and spaced evenly in between), but if you have particularly uneven sampling over the range of the covariate, you may choose to place knots evenly spaced sample quantiles of the covariate, for example.

Every other smooth in mgcv is special, being used where you want isotropic smooths or two or more covariates, or are for spatial smoothing, or which implement shrinkage, or random effects and random splines, or where covariates are cyclic, or the wiggliness varies over the range of a covariate. You only need to venture this far into the smooth toolbox if you have a problem that requires special handling.

Shrinkage

There are shrinkage versions of both the TPRS and CRS in mgcv. These implement a spline where the perfectly smooth part of the basis is also subject to the smoothness penalty. This allows the smoothness selection process to shrink a smooth back beyond even a linear function essentially to zero. This allows the smoothness penalty to also perform feature selection.

Duchon splines, P splines and B splines

These splines are available for specialist applications where you need to specify the basis order and the penalty order separately. Duchon splines generalise the TPRS. I get the impression that P splines were added to mgcv to allow comparison with other penalized likelihood-based approaches, and because they are splines used by Eilers & Marx in their 1996 paper which spurred a lot of the subsequent work in GAMs. The P splines are also useful as a base for other splines, like splines with shape constraints, and adaptive splines.

B splines, as implemented in mgcv allow for a great deal of flexibility in setting up the penalty and knots for the splines, which can allow for some extrapolation beyond the range of the observed data.

Cyclic splines

If the range of values for a covariate can be thought of as on a circle where the end points of the range should actually be equivalent (month or day of year, angle of movement, aspect, wind direction), this constraint can be imposed on the basis. If you have covariates like this, then it makes sense to impose this constraint.

Adaptive smoothers

Rather than fit a separate GAM in sections of the covariate, adaptive splines use a weighted penalty matrix, where the weights are allowed to vary smoothly over the range of the covariate. For the TPRS and CRS splines, for example, they assume the same degree of smoothness across the range of the covariate. If you have a relationship where this is not the case, then you can end up using more degrees of freedom than expected to allow for the spline to adapt to the wiggly and non-wiggly parts. A classic example in the smoothing literature is the

library('ggplot2')
theme_set(theme_bw())
library('mgcv')
data(mcycle, package = 'MASS')
pdata <- with(mcycle,
              data.frame(times = seq(min(times), max(times), length = 500)))

ggplot(mcycle, aes(x = times, y = accel)) + geom_point()

These data clearly exhibit periods of different smoothness - effectively zero for the first part of the series, lots during the impact, reducing thereafter.

if we fit a standard GAM to these data,

m1 <- gam(accel ~ s(times, k = 20), data = mcycle, method = 'REML')

we get a reasonable fit but there is some extra wiggliness at the beginning and end the range of times and the fit used ~14 degrees of freedom

plot(m1, scheme = 1, residuals = TRUE, pch= 16)

To accommodate the varying wiggliness, an adaptive spline uses a weighted penalty matrix with the weights varying smoothly with the covariate. Here I refit the original model with the same basis dimension (k = 20) but now we have 5 smoothness parameters (default is m = 5) instead of the original's 1.

m2 <- gam(accel ~ s(times, k = 20, bs = 'ad'), data = mcycle, method = 'REML')

Notice that this model uses far fewer degrees of freedom (~8) and the fitted smooth is much less wiggly at the ends, whilst still being able to adequately fit the large changes in head acceleration during the impact.

What's actually going on here is that the spline has a basis for smooth and a basis for the penalty (to allow the weights to vary smoothly with the covariate). By default both of these are P splines, but you can also use the CRS basis types too (bs can only be one of 'ps', 'cr', 'cc', 'cs'.)

As illustrated here, the choice of whether to go adaptive or not really depends on the problem; if you have a relationship for which you assume the functional form is smooth, but the degree of smoothness varies over the range of the covariate in the relationship then an adaptive spline can make sense. If your series had periods of rapid change and periods of low or more gradual change, that could indicate that an adaptive smooth may be needed.

Solved – In R package mgcv, is it valid to have a random effect smooth on two continuous variables

Huh... I made my post as a guest on SO because I am still on a suspension, but then the question got migrated here!

So, if I understand you correctly, there's not really any similarity between the smooth s(x1, x2) and the random effect s(x1, x2, fac, bs = "re"), correct?

Correct. The function name "s" does not mean "smooth function" when s() is used to construct a random effect. Broadly speaking, s() is just a model term constructor routine that constructs a design matrix and a penalty matrix.

What I was envisioning was something smoothing in 2 dimensions like the former, but with some deviations from the average by factor level. You can get separate smooths per factor level using s(x1, x2, by=fac), but that completely separates the data for each factor level, rather than doing some partial pooling.

s(x1, x2, by = fac) gives you something pretty close to what you want, except that as you said, data from different factor levels are treated independently. Technically, "close" means that s(x1, x2, by = fac) gives you the correct design matrix but not the correct penalty matrix. In this regard, you are probably aiming at te(x1, x2, fac, d = c(2, 1), bs = c("tp", "re")). I have never seen such model term before, but its construction is definitely possible in mgcv:

library(mgcv)

x1 <- runif(1000)
x2 <- runif(1000)
f <- gl(5, 200)

## "smooth.spec" object
smooth_spec <- te(x1, x2, f, d = c(2, 1), bs = c("tp", "re"))

## "smooth" object
sm <- smooth.construct(smooth_spec,
                       data = list(x1 = x1, x2 = x2, f = f),
                       knots = NULL)

You can check that this smooth term has 2 smoothing parameters as expected, one for the s(x1, x2, bs = 'tp') margin, the other for the s(f, bs = 're') margin.

Specification of k turns out subtle. You need to explicitly pass nlevels(f) to the random effect margin. For example, if you want a rank-10 thin-plate regression spline,

## my example factor `f` has 5 levels
smooth_spec <- te(x1, x2, f, d = c(2, 1), bs = c("tp", "re"), k = c(10, 5))
sapply(smooth_spec$margin, "[[", "bs.dim")
# [1] 10  5

At first I was thinking that perhaps we can simply pass NA to the random effect margin, but it turns out not!

smooth_spec <- te(x1, x2, f, d = c(2, 1), bs = c("tp", "re"), k = c(10, NA))
sapply(smooth_spec$margin, "[[", "bs.dim")
# [1] 25  5  ## ?? why is it 25? something has gone wrong!

This might implies that there is a tiny bug... will have a check when available.