Solved – Interactions in GAM

generalized-additive-modelinteractionmgcv

I know similar questions have come up a lot but I'm still confused on how to model interactions in GAM (using mgcv in R). In my analysis, my response variable has normally distributed residuals and the variable is related to three continuous variables.

My goal is to predict values of y over a range of values of the continuous predictors. Also, I would like to compare the estimated slopes (and smooths?) to simulated data. I believe there are interactions between the continuous predictors.

Upon visual inspection it seems the relationship between y and one of the continuous predictors is non-linear. Hence, I will use GAM. With two linear predictors and one non-linear predictor what would be the appropriate model? One that simply includes all interactions in a tensor product?

y = a + te(x1, x2, x3)

But if x2 and x3 are linearly related to y then:

y = a + te(x1, x2) + te(x1, x3) + b1(x2) + b2(x3) + b3(x2:x3)

Or:

y = a + s(x1) + b1(x2) + b2(x3) + b3(x1:x2) + b4(x1:x3) + b5(x2:x3)

Or, we can throw ti() into the mix:

y = a + ti(x1) + b1(x2) + b2(x3) + ti(x1:x2) + ti(x1:x3) + b3(x2:x3)

Best Answer

Your question seems to confuse a few separate issues. First forget about functional forms and assume that linear models are appropriate. You've got three variables that potentially predict $y$. You should interact them if the level of one mediates the influence of the others. So, lets say that $y$ is the happiness of cats, $x_1$ is cat food quantity, and $x_2$ is the distance of the cat from the nearest cat food, then you might imagine that the effect of $x_1$ on $y$ is mediated by $x_2$.

Now, assume that cats have declining marginal utility. You want to model s(x1) to get the shape of that curve. But you suspect that it might only work for nearby food, and that far-off food makes a cat less happy (because they have to get up to go get it). Now, if the effect of distance in mediating the happiness effect of quantity changes the same amount over all distance increments, you're good. y~s(x1,by=x2). Linear interaction. If on the other hand the effect of distance in mediating the happiness effect of quantity is non-linear (maybe the cat doesn't mind walking a 10 steps, and doesn't mind walking 20, and is indifferent between 10, 20, and 40, but can't be made to walk 1000 steps for any quantity of cat food), then you need a nonlinear interaction. y~te(x1,x2).

You should apply the same sort of logic to 3-way interactions. Maybe this entire relationship is mediated by the phase of the moon. If cats have linear utility and enjoy long walks when the moon is full, then y~te(x1,x2,by=FULLMOON), where FULLMOON is a dummy. If however there is a continuous monthly cycle in which cats change from insatiable walkers to being contented and lazy, you'd need something like y~te(x1,x2,MOONPHASE). Where moonphase is continuous.

Edit: For model selection between reasonable candidate models underuncertainty about which one makes the most sense, try cross validation http://en.wikipedia.org/wiki/Cross-validation_%28statistics%29