GAMM – How to Fit a Longitudinal GAM Mixed Model

fixed-effects-modelgeneralized-additive-modelmixed modelrrandom-effects-model

I have repeated measurements of individuals, like this

y  |  id  | time  |  V1   | V2
100    1       1     23.2   0.8
88     1       2     22.6   0.9
98     2       1     10.6   1.1
83     2       2     11.1   1.3

y is a continous outcome variable, id is the patient id, time is the time point of oberservation and V1 and V2 are covariates.

In this data example we have 2 patients that have 2 observations each, one at time 1 and another one at time 2.

My real data sets has hundreds of patients (ids) and 2-5 observations (time) for each patient.

I now know that the effect of V1 on y is non-linear and what to model this with a GAM.

In the mgcv package there is a function called gamm(). In my example, I use it like this:

m <- gamm(y ~ s(V1) + V2 + time ,family=gaussian,
          data=dat,random= ~(time|id))

Is this correct?
Does this model integrate the fact that V1 can change over time for each individual?

Best Answer

What @Roland is getting at is to use a random spline basis for the time by id random part. So your model would become:

m <- gam(y ~ s(V1) + V2 + s(time, id, bs = 'fs'),
         family=gaussian, data=dat, method = "REML")

This model says that the effect of time is smooth and varies by id, with a separate smooth being estimated for each id but each smooth is assumed to have the same wiggliness (a single smoothness penalty is estimated for all the time smoothers) but can differ in shape.

To estimate a separate "global" effect the model could be

m <- gam(y ~ s(V1) + V2 + s(time) + s(time, id, bs = 'fs'),
         family=gaussian, data=dat, method = "REML")

If you want similar models but where each smooth can have different wiggliness as well as shape, then the by smoothers can be used:

## without a "global" effect
m <- gam(y ~ s(V1) + V2 + s(id, bs = 're') + s(time, by = id),
         family=gaussian, data=dat, method = "REML")

## with a "global" effect
m <- gam(y ~ s(V1) + V2 +
         s(id, bs = 're') + s(time) + s(time, by = id, m = 1),
         family=gaussian, data=dat, method = "REML")

The m=1 means that the smoother uses a penalty on the squared first derivative, which penalises departure from a flat function of no effect. As this is on the subject specific smooths, the model is penalising deviations from the "global" smooth.

Some colleagues and I have described these models in some detail in a paper submitted to PeerJ, which is available as a preprint. A new version in response to reviewers comments should be up in a few days (we've submitted it to the journal).

Related Question