My data includes the following:
Y
which is a physiological measureAge
continuousGender
factor,Male
orFemale
Timepoint
One for each year of study visit:1,2,3,4
subject
each unique subject
Model aim:
- To identify longitudinal changes of
Y
acrossTimepoints
, for eachsubject
betweenGender
- I am assuming a random slope and intercept is appropriate as each
subject
will have a different physiological change betweenTimepoints
- I am assuming a random slope and intercept is appropriate as each
- I am also trying to use
gratia::smooth_estimates
to plot the smooth term forage
betweenGender
Model:
m1 <- gam(Y ~ subject + Gender + s(age, by = subject) + s(Age, by = Gender, bs = 'fs') + s(Timepoint, subject, bs ='fs') + s(subject, bs = "re") ,
data = DF,
method = "REML")
However, this does not work,
Error in smooth.construct.tp.smooth.spec(object, dk$data, dk$knots) : A term has fewer unique covariate combinations than specified maximum degrees of freedom
So I have to use:
m1.reduced <- gam(Y ~ Gender + s(Age, by = Gender, bs = 'fs') + s(subject, bs = "re"),
method = "REML",
data = DF_y)
Or I can also use gamm4 (which works with the plot below, though I am not sure if it is appropriately modeling what I am intending:
m2 <- gamm4(Y ~ s(Age, by = Gender, bs = 'fs') + Gender+ Timepoint,
random = ~ (1 + Timepoint | subject),
data = DF)
Plot:
sm <- smooth_estimates(m1) %>%
add_confint()
ggplot(data = sm, aes(y = est, x = Age)) +
geom_ribbon(aes(ymin = lower_ci, ymax = upper_ci, fill = Gender),
alpha = 0.25) +
geom_line(aes(color = Gender), size = 1.5) +
geom_point(data = DF, aes(Age, Y, color = Gender), alpha = .25) +
geom_line(data = DF, aes(Age, Y, group = subject), alpha = 0.25) +
ylab("Partial effect") +
xlab("Age"))
I am trying to replicate a plot like this (source): where each color is a gender and the predicted longitudinal trajectory is plotted across age.
This is my model 2 (gamm4
) plot that is providing what I want.
However, is my gam
model modeling and plotting longitudinal change correctly?
sample data:
structure(list(subject= c(33714L, 35377L, 40556L, 40798L, 40800L,
40815L, 50848L, 52183L, 52461L, 53320L, 53873L, 54206L, 54581L,
55122L, 55267L, 55462L, 55612L, 55920L, 56022L, 56307L, 56420L,
56679L, 57405L, 57480L, 57725L, 57809L, 58004L, 58215L, 58229L,
59326L, 59327L, 59865L, 60099L, 60100L, 60280L, 60384L, 60429L,
60493L, 60503L, 60603L, 60664L, 60846L, 61415L, 61749L, 61883L,
62081L, 62983L, 63327L, 63329L, 64418L, 64507L, 64596L, 65178L,
65250L, 65802L, 65975L, 65978L, 66396L, 66572L, 66589L, 74034L,
74427L, 74607L, 74952L, 75732L, 76574L, 76595L, 76755L, 76759L,
77203L, 77453L, 77668L, 81064L, 81065L, 33714L, 35377L, 40556L,
40798L, 40800L, 40815L, 50848L, 52183L, 52461L, 53320L, 53873L,
54206L, 54581L, 55122L, 55267L, 55462L, 55612L, 55920L, 56022L,
56307L, 56420L, 56679L, 57405L, 57480L, 57725L, 57809L, 58004L,
58215L, 58229L, 59326L, 59327L, 59865L, 60099L, 60100L, 60280L,
60384L, 60429L, 60493L, 60503L, 60603L, 60664L, 60846L, 61415L,
61749L, 61883L, 62081L, 62983L, 63327L, 63329L, 64418L, 64507L,
64596L, 65178L, 65250L, 65802L, 65975L, 65978L, 66396L, 66572L,
66589L, 74034L, 74427L, 74607L, 74952L, 75732L, 76574L, 76595L,
76755L, 76759L, 77203L, 77453L, 77668L, 81064L, 81065L, 33714L,
35377L, 40556L, 40798L, 40800L, 40815L, 50848L, 52183L, 52461L,
53320L, 53873L, 54206L, 54581L, 55122L, 55267L, 55462L, 55612L,
55920L, 56022L, 56307L, 56420L, 56679L, 57405L, 57480L, 57725L,
57809L, 58004L, 58215L, 58229L, 59326L, 59327L, 59865L, 60099L,
60100L, 60280L, 60384L, 60429L, 60493L, 60503L, 60603L, 60664L,
60846L, 61415L, 61749L, 61883L, 62081L, 62983L, 63327L, 63329L,
64418L, 64507L, 64596L, 65178L, 65250L, 65802L, 65975L, 65978L,
66396L, 66572L, 66589L, 74034L, 74427L, 74607L, 74952L, 75732L,
76574L, 76595L, 76755L, 76759L, 77203L, 77453L, 77668L, 81064L,
81065L), Gender = structure(c(1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L,
2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L,
2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L,
1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L,
2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L,
1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L,
2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L,
2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L,
2L, 1L, 2L, 1L, 1L), .Label = c("Male", "Female"), class = "factor"),
Age = c(15L, 15L, 9L, 11L, 16L, 9L, 16L, 16L, 14L, 8L, 6L,
14L, 10L, 15L, 13L, 15L, 8L, 9L, 9L, 8L, 9L, 9L, 13L, 10L,
7L, 8L, 8L, 6L, 15L, 8L, 11L, 14L, 12L, 10L, 16L, 12L, 10L,
6L, 13L, 11L, 12L, 13L, 10L, 13L, 14L, 12L, 17L, 9L, 12L,
11L, 10L, 12L, 10L, 10L, 14L, 16L, 15L, 14L, 14L, 13L, 10L,
12L, 9L, 9L, 16L, 10L, 14L, 15L, 13L, 15L, 13L, 13L, 8L,
11L, 16L, 16L, 11L, 13L, 18L, 10L, 18L, 18L, 15L, 10L, 8L,
15L, 12L, 16L, 14L, 16L, 9L, 11L, 11L, 10L, 10L, 11L, 14L,
12L, 8L, 9L, 9L, 8L, 16L, 9L, 13L, 16L, 13L, 12L, 18L, 13L,
11L, 8L, 14L, 12L, 13L, 14L, 11L, 14L, 15L, 14L, 18L, 11L,
14L, 12L, 12L, 13L, 11L, 11L, 15L, 17L, 16L, 15L, 15L, 14L,
11L, 13L, 10L, 11L, 18L, 11L, 15L, 16L, 14L, 17L, 14L, 14L,
9L, 12L, 18L, 18L, 12L, 14L, 19L, 11L, 19L, 19L, 16L, 11L,
9L, 17L, 13L, 18L, 16L, 18L, 11L, 12L, 12L, 11L, 11L, 12L,
16L, 13L, 9L, 11L, 10L, 9L, 17L, 11L, 14L, 17L, 14L, 13L,
19L, 15L, 12L, 9L, 15L, 14L, 14L, 15L, 12L, 16L, 17L, 15L,
20L, 12L, 15L, 13L, 13L, 14L, 12L, 12L, 16L, 19L, 18L, 16L,
16L, 15L, 12L, 14L, 11L, 11L, 19L, 12L, 16L, 17L, 15L, 18L,
15L, 15L, 10L, 13L), Y= c(-3.91040964443843, -4.1357309415226,
-3.15711439697149, -5.3057953682613, -5.64125379300701, -5.14501797062188,
0.244858660867345, 0.0760011434854038, -1.38050716781916,
-0.765398953185992, -1.47720169386702, -1.64309892713909,
-0.396819316883117, -1.36975793945299, -1.80785572309015,
-0.401914887891913, -0.811404680008261, -2.4492910546021,
-0.620903118150862, -1.10304119407832, -0.683772758549858,
-1.47159656575735, -0.823876696667884, 0.229547683217108,
-1.91347963071532, -1.05594355918274, -2.37962731952471,
-1.34893889218848, -0.385827442278431, -0.980092345023242,
-0.181058281596406, -0.832975930612162, -1.9959308225624,
-0.12787022611412, -0.205371434695515, -0.609353157197594,
-0.477183751078979, -1.53245224237668, -0.137187841673061,
-1.11978378453579, -1.38506891710323, -0.665622819909006,
-0.858065551674517, -0.935493966384356, -0.444790478237349,
-0.0888284463372063, 0.793433276899957, -1.37118955226023,
-1.27752810419379, 0.571630350274245, 0.38391922015976, -0.585743678190174,
-0.639562613892594, 0.637702921021628, 1.02283103079116,
0.509051885361466, 0.256796855802238, -0.827467860997893,
1.25436407158129, 0.797121499725372, -1.46470541258355, 0.72639012119852,
-1.77912640845008, 1.31912635264069, 0.135328148802092, -0.522413009938002,
0.680529982119362, 0.068624697834574, -0.598312753345204,
0.141370040141094, -0.392378890718311, 1.88269650621348,
-3.0968410713246, -0.0622101539722123, -4.1452184094485,
-0.954638754603117, -0.106347504757922, -0.716821176235476,
0.33910245990621, -1.65370256776216, 0.666505029533247, 1.11178514183042,
0.0370564222038934, 0.0466166840013479, -0.557572449898689,
-0.961796818639283, 0.936133529770083, -0.283139554139275,
-0.111006312537392, 0.655416096433155, 0.253351279215336,
-1.27117077274472, 0.821677299062997, 1.30061244464207, -0.571112110007776,
-0.978927643078376, -0.0939968112175565, 1.01351341523222,
-0.482667556069396, 0.416191169882131, -1.13218300732386,
0.0109719515636322, 0.570902411558704, 0.251628490921888,
0.757376045856767, -0.286245425992256, -0.850640576775985,
0.323694423760568, 0.33667599752107, 0.489882832518856, 0.0192219236731108,
-0.267852841112888, 1.81167395220041, -0.0196014744891418,
-0.0975879755475657, 1.11193072957353, 1.43135023795345,
-0.0901872652728838, 0.562191411596049, 0.0591857591563773,
1.10166679368438, -0.140293713526042, 0.0486791770287176,
0.34512008662136, 0.818328780971503, -0.238419852381131,
-0.368114266866903, 1.03486628422146, 0.539091489689507,
-0.233227222876931, -0.351614322647946, -0.108579850152251,
1.42892377556831, 0.96726504217144, -0.436055213650844, 0.565831105173759,
-1.54960733143962, 1.100259445501, -0.614861226811863, -0.361659876922429,
1.40706134947819, -0.552573937385301, 0.105482661464863,
-0.374326010572864, -0.0292830594058542, 1.98545718822419,
-1.19221368673224, 0.760991474810628, -0.539568099000947,
-0.0739542319162944, 0.34373700306183, 0.053071073945822,
0.0606901658351647, -0.211219209043705, 0.216590374080456,
1.00007081361854, -0.495649129829898, 0.0867746364754286,
-1.67723925289802, -0.826836980777755, 1.18863120556783,
-0.795681203752549, -0.650869928607352, 0.333545861044237,
0.63103014946249, -1.75748236397463, -0.118601139802882,
1.51474775013074, 0.132537717059181, -0.677973513449372,
0.467947612557183, 1.56786301174147, -0.118528345931329,
0.554596584330558, -0.59224659738235, -0.543498968064875,
-0.380440695783417, 0.0643541240367275, -0.210830975062082,
0.503471021875644, -0.660672836643319, 1.50676468888363,
0.780839937121078, -0.116635705270918, -0.240385286913095,
0.00044110481212036, 1.4571920623552, -0.350401091455376,
0.0850033189342734, 0.197882349091022, 0.726511444317778,
-0.331741595713643, 0.837497833814114, 0.609264781867777,
0.532151807268005, 0.65446977610295, -0.581909867621652,
0.322942220421174, 1.52993740466172, -0.264455793773691,
-0.0568476721010507, 1.05299195823846, 0.915435805624833,
0.297609953120304, -0.114257772133482, 0.248013061968027,
1.64822744593733, 1.26800079018578, -0.0459528559917708,
0.720784993088846, -1.05679282101754, 1.38345187047077, -0.64414862780051,
-1.25411274217718, 0.177548594303543, -0.257734492966852,
-0.818028922319694, 0.409736779937656, -0.216411838547905,
1.32536236097051, -0.500938817829506, 0.803770006660658),
Timepoint= c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 2, 2, 3, 3, 3, 2, 3, 3, 2, 3, 3, 2, 3, 2, 2, 2, 2, 3,
3, 3, 2, 3, 2, 3, 2, 2, 2, 3, 2, 2, 3, 3, 2, 3, 3, 2, 2,
3, 2, 2, 2, 2, 2, 2, 2, 3, 2, 3, 3, 2, 3, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 3, 3, 2, 2, 2, 2, 3, 2, 2, 2, 2, 4,
4, 4, 4, 4, 3, 4, 4, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,
3, 4, 4, 4, 3, 4, 3, 4, 3, 4, 4, 4, 3, 4, 4, 4, 3, 4, 3,
4, 3, 3, 3, 4, 4, 4, 4, 4, 4, 3, 4, 3, 3, 3, 3, 4, 4, 3,
3, 3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 4, 3, 3, 3, 3)), row.names = c(NA,
-222L), class = c("tbl_df", "tbl", "data.frame"))
Best Answer
The first model fails because you ask for too many coefficients; you ask for
Subject
specific intercepts twice (!) and try to fit 9 basis functions persubject
in thes(Timepoint, Subject, bs = "fs")
term.The
"fs"
smooth ofTimepoint
bySubject
already contains a random intercept perSubject
: you don't need to add another separate random intercept via thes(Subject, bs = "re")
term. You're just trying to estimate an extranlevels(Subject)
parameters that are confounded with the"fs"
terms random intercept parameters, andnlevels(Subject)
of these parameters are going to be shrunk to 0.Whenever you include
"fs"
smooths, you typically don't need a separate random intercept for the grouping variable of the"fs"
smooth.Looking at your plot, in many case you <<9 observations per subject. Unless you have >>9 observations for other subjects you can't estimate 9 parameters per subject; you don't have the data to support such rich smooths.
The
s(Age, by = Gender, bs = "fs")
really isn't doing what you think it's doing. There's no grouping variable here (so this should arguably raise an error). This just needs to bes(Age, by = Gender)
.You compound the problem from 1. by adding another
nlevels(Subject) - 1
parameters for thesubject
term by estimated parametric effects. While we say you "have" to include theby
variable as parametric effect, typically what we mean is you need to have something in the model for the group means of the factorby
term(s). The random intercept in either thes(Timepoint, Subject, bs = "fs")
smooth or thes(Subject, bs = "re")
smooth would account for these group means ofSubject
; you don't need yet more terms.Get rid of
Subject
from the parametric effects, drop theSubject
random intercept, and think through how many parameters you can afford to expend on the subject-specific time curves (in the"fs"
Timepoint` smooth).Why do you estimate time curves by subjects as having the same wiggliness but age by subject curves are each allowed their own wiggliness? You now have to choose values for
nlevels(Subject)
extra smoothness parameters. It also doesn't make much philosophical sense to treat one as essentially random and the other fixed for the same grouping variable.A much simpler model for the first case would be:
and shift
k
smaller if you can't fit even this simpler model.If you want subject-specific
Age
effects, you could add that to the model as so:but at this point we probably need to start thinking more carefully as you'd want to be sure you aren't estimating another
nlevels(Subject)
random intercepts and check what identifiability constraints are being applied here.With the exception of considering an auto correlated process in the model (Frank's suggestion), a GAM such as the one I show above would be a reasonable starting point for modelling longitudinal data via a GAM.
You're never going to get a plot like the one you showed from the paper by using
smooth_estimates()
. This is evaluating the partial effect of each smooth, while the plot from the paper seems to be on the response scale (so it would include the intercept). You could achieve this, kind of, by just adding the correct parametric effects to theest
column (and the upper and lower intervals), but you'd have to get the right parametric effects added to the right rows, and you would have too-narrow an interval because you aren't really including the uncertainties in those parametric effects into the std. error.Beyond that, you can't just pass the object of
smooth_estimates()
toggplot()
and expect it to do anything useful. You returned values for all smooths in your model in the objectsm
, so any row with anAge
value that isn'tNA
will end up getting plotted somehow.Better would be to get predictions from your model for the scenarios you want. To do that you would be forced to write down what you actually want to achieve, not just tell us what you want it to look like that image (I'm not reading a paper's methods section just to intuit what you want to show and what the authors of that paper did to generate their plot).
You seem to want to plot estimates of Age by Gender at the population level. So you will want something like:
where you are explicitly conditioning on the
Age
andGender
values, and we'll exclude the random smooth effects due toSubject
andTimepoint
andAge
from mym1a
(assuming you can fit that) by use of theexclude
argument topredict.gam()
.Top get the predicted values use
gratia::fitted_values()
unless you want to do everything by hand yourself, in which case usepredict(m1a, new data = new_df, exclude = ...., se.fit = TRUE)
etc withexclude
set as below:where the things we pass to
exclude
are the labels of the smooths that relate toSubject
level that we want to ignore as you seem to want population level effects. You can get these labels fromgratia::smooths(m1a)
if you don't know how {mgcv} creates labels for it's smooths.At this point you might want to rethink your model structure a bit to include population (Pedersen et al 2019 PeerJ call these "global" smooths) effects for
Timepoint
as the only thing account forTimepoint
is a random smooth term. Even so, you'd still have to condition the predictions on the specified value ofTimepoint
(here I use the median time point but you can plug in whatever value works for you).Your
gamm4()
makes similar mistakes (thebs = "fs"
is redundant on thes(Age, by = Gender, bs = "fs")
smooth and why do you now model time as a linear parametric effect ofTimepoint
when in thegam
you had subject-specific random smooths ofTimepoint
for these effects? If you wanted this in thegam()
why not thegamm4()
?These models are so flexible and there are so many options, one needs to restrain oneself from trying everything in a model because you think you need it. Think about the model (effects) you want fit, write it down in words, bullet point form. Then write out a model formula in R's notation that you think covers each bullet point. Then post that in your question if you want to know if this is a suitable model for your data.