I have two questions about how to specify random effects structures in mgcv using bam. I'm using bam because I have a large data set (~15,000 data points) that consists of interviews with different speakers. The binary outcome variable consists of a particular choice between two words, which is influenced by the last choice that a speaker made and the (log) time elapsed between the choices.
m0 = bam(obs ~ prev*log.lag, data=mydata, family='binomial')
I want to include random intercepts and slopes by speaker for the effect of the previous choice. The random intercept models the fact different speakers choose one form or another at different rates, the random slope models the fact that different speakers have different degrees of influence from the previous choice. Everybody tends to repeat their choice, but some people tend to do so more than others.
My first question is which of the following two models (if either) accurately reflects the random effects structure that I want. The first way of doing this, which I've seen suggested a couple of places, includes two random effects terms as in m1
:
m1 = bam(obs ~ prev*log.lag + s(speaker, bs='re') + s(prev, speaker, bs='re'), data=mydata, family='binomial')
This seems redundant. For example, when I get the values out of m1
using plot.gam it looks like s(prev, speaker, bs='re')
contains both a random intercept and a random slope for each speaker. There are twice as many values returned as there are factor levels for speaker.
plot.m1 = plot.gam(m1)
> length(plot.m1[[2]]$fit) == 2*length(unique(mydata$speaker))
[1] TRUE
Is this an appropriate way of specifying the random effects structure? or is the model fitting a random intercept by speaker twice? What would it mean if it were?
Another way of specifying the random effects is to not include a separate random intercept term.
m2 = bam(obs ~ prev*log.lag + s(prev, speaker, bs='re'), data=mydata, family='binomial')
plot.m2 = plot.gam(m2)
> length(plot.m2[[1]]$fit) == 2*length(unique(mydata$speaker))
[1] TRUE
Could someone help me understand what the difference between these two is? In either case, extracting $fit
from s(prev, speaker, bs='re')
yields what look like both random intercepts and slopes. Which of these is which? Are random intercepts for all speakers listed then random slopes, are they listed as pairs per speaker? Looking at the code of plot.gam
it seems like these $fit
values are a function of the random effects matrix, but I'm not sure exactly how they follow.
I also want to include a random smooth by speaker to account for aspects of each unique interview that could account for changes in choices by each speaker, as in m3
. My second question is whether the random smooths include both random intercepts and slopes.
m3 = bam(obs ~ prev*log.lag + s(prev, speaker, bs='re') + s(time, by=speaker, bs="fs", m=1), data=mydata, family='binomial')
Looking at the mgcv
documentation (p.171) it seems that using 'by' with the factor speaker will make the smooths subject to a centering constraint. How is this centering constraint related to random intercepts?
Thanks!
Best Answer
This is an old question, but I ran into a similar situation when trying to run a
gamm
model usingbam
.I believe what's going on is that
s(prev, speaker, bs='re')
in the model above is estimating a variance term for the~prev:speaker-1
interaction (random slopes), whereass(time, by=speaker, bs="fs", m=1)
is estimating a separate slope term for each speaker. These slopes do not share a variance term, meaning that they aren't really a random effect (and aren't necessarily centred on zero).Here's a demonstration of the differences between the two methods, using
gamm
to compare them. These simulated data are from a normal distribution, but this should work for whatever distribution you're using: