R Data Visualization – Changing Y-axis Value in GAM Plot from Smoothed to Actual Response Variable

data visualizationmgcvrsmoothing

I know similar questions have been asked before, but I've looked through them all and I'm still confused, in part because the other examples seem to be more complicated, so thought I'd throw mine up. I made a very simple GAM and plotted it, but the y axis values are the smoothed values. I'd like to make the y axis have the original units so it's easier to understand.

This is a growth curve, so I'm looking at weight by age separated by sex, with individual ID as a random effect since there are some repeated measures.
The plot gives me two plots, one for males and one for females, however I'd love the y axis to be in kilograms like the original dataset is, to better understand and compare the patterns.

gammWeight <- gamm(data = Data, Weight ~ s(Age, by = Sex), 
                   random = list(IndivID=~1))
plot(gammWeight$gam)

I know this is likely a simple fix, but I haven't had much luck looking and this is my first time doing GAMs so figured I'd ask the experts.

ETA: I got it! I just added
shift = coef(gammWeight$gam)[1] in the plot, so now it's:
plot(gammWeight$gam), shift = coef(gammWeight$gam)[1])

I figured I'd keep this up in case anyone is searching and has the same question! I got the answer from Noam Ross's github tutorial.

Best Answer

The general solution is to use the predict() for a set of evenly spaced values of Age, repeated for each level of Sex.

But to just get a plot, you could do

plot(gammWeight$gam, shift = coef(gammWeight$gam)[1])

That will only show estimates for the first level of Sex, so the predict() approach is much better as you get actual estimates for all levels of that variable if you create the input data you pass to newdata correctly.

By the way, your model is missing the group means. You want Weight ~ Sex + s(Age, by = Sex)

To use predict() one first needs to create the new data at which the model will be evaluated, then we predict on the link scale, form the required credible interval, then transform to the response scale with the inverse of the link function used in the model. With a family = gaussian() model we can skip the last step as the link in such models is the identity function.

Hence (pseudo-ish code - not tested as there's no reproducible example here):

## new data - sequence of 200 values over range of Age
##          - all levels of Sex
##          - choose a dummy value of the subject
## expand.grid provides all combinations so the Age sequence is repeated for
## each level of Sex
newd <- with(Data, expand.grid(Age = seq(min(Age), max(Age), length = 200),
                               Sex = factor(levels(Sex), levels = levels(Sex)),
                               IndivID = IndivID[1]))

## predict
p <- predict(gammWeight$gam, newdata = newd, se.fit = TRUE)
p <- as.data.frame(p)
pred <- cbind(newd, p)
pred <- transform(pred, lower = fit - (1.96 * se.fit),
                        upper = fit + (1.96 * se.fit))

Now you can plot fit against Age, colouring by Sex say, to get actual model predicted values of the expected weight as a function of Age for each level of Sex. The plot() + shift approach is only adding the constant for the reference level of Sex to each smooth drawn. In other words, it's not showing the differences between sexes in their respective mean weights.

The above assumes you are using the corrected model Weight ~ Sex + s(Age, by = Sex); your model as you have it now is likely wrong because the smooth for each sex is trying to capture both the nonlinear way in which weight varies with age but also to fit the mean weight for that sex.