Generalized Additive Model – How to Factor Smooth and Main Effects for Both Variables?

generalized-additive-modelinteractionmgcvnonlinear regressionr

I am working with data on cuteness judgements and size judgments for differently named fantasy creatures. The interesting part of the different names are the vowels contained in them.

The question I am fitting a GAM for is: Do the effects of cuteness and vowel interact when it comes to size judgements?

Thus far, my model looks like this:

mdl1 <- gam(size ~ 
                s(cuteness, bs = "bs", by = vowel, k = 5),
                data = data)

The interaction of cuteness and vowel reaches significance for some of the vowels (which is fine).

Now a reviewer asked to modify the model so that not only the interaction effects of cuteness and vowel are contained but also the individual main effects of both cuteness and vowel, e.g. like this:

mdl2 <- gam(size ~ 
                s(cuteness, bs = "bs", by = vowel, k = 5) +
                cuteness +
                vowel,
                data = data)

As I have not worked with interactions in GAMs before, I am at a loss when it comes to deciding whether such a model structure is valid / makes sense (from a statistical point of view). Hence, my question: Is such a model structure valid, does it make sense?

Best Answer

Your original model was incorrectly specified; it should have been

mdl1 <- gam(size ~ vowel +
                s(cuteness, bs = "bs", by = vowel, k = 5),
                data = data, method = "REML", family = ...)

Where beyond some technical things — method and choosing an appropriate family for size (is it really conditionally Gaussian? 0 or negative size values don't make immediate sense to me and if so that might preclude using family = gaussian()) — note that I have included the mean response for each vowel through the parametric factor term. You could replace this with s(vowel, bs = "re") instead.

With that specification you have the main effect for the vowel means but there is no average effect for cuteness, which is what I presume the reviewer means by "main effect". To specify that model, I would suggest using the new-ish "sz" basis:

mdl1 <- gam(size ~
                s(cuteness, bs = "bs", k = 5) +
                s(cuteness, vowel, bs = "sz", k = 5,
                  xt = list(bs = "bs")),
                data = data, method = "REML", family = ...)

With this basis, you don't need the explicit parametric term for the group means; those are included in the "sz" basis, as deviations from the overall mean of the response (the model constant / Intercept term).