Solved – Specifying and extracting random intercepts and slopes from GAMM using bam in mgcv

I have two questions about how to specify random effects structures in mgcv using bam. I'm using bam because I have a large data set (~15,000 data points) that consists of interviews with different speakers. The binary outcome variable consists of a particular choice between two words, which is influenced by the last choice that a speaker made and the (log) time elapsed between the choices.

m0 = bam(obs ~ prev*log.lag, data=mydata, family='binomial')

I want to include random intercepts and slopes by speaker for the effect of the previous choice. The random intercept models the fact different speakers choose one form or another at different rates, the random slope models the fact that different speakers have different degrees of influence from the previous choice. Everybody tends to repeat their choice, but some people tend to do so more than others.

My first question is which of the following two models (if either) accurately reflects the random effects structure that I want. The first way of doing this, which I've seen suggested a couple of places, includes two random effects terms as in m1:

m1 = bam(obs ~ prev*log.lag + s(speaker, bs='re') + s(prev, speaker, bs='re'), data=mydata, family='binomial')

This seems redundant. For example, when I get the values out of m1 using plot.gam it looks like s(prev, speaker, bs='re') contains both a random intercept and a random slope for each speaker. There are twice as many values returned as there are factor levels for speaker.

plot.m1 = plot.gam(m1)

> length(plot.m1[[2]]$fit) == 2*length(unique(mydata$speaker)) [1] TRUE

Is this an appropriate way of specifying the random effects structure? or is the model fitting a random intercept by speaker twice? What would it mean if it were?

Another way of specifying the random effects is to not include a separate random intercept term.

m2 = bam(obs ~ prev*log.lag + s(prev, speaker, bs='re'), data=mydata, family='binomial')

plot.m2 = plot.gam(m2)

> length(plot.m2[[1]]$fit) == 2*length(unique(mydata$speaker)) [1] TRUE

Could someone help me understand what the difference between these two is? In either case, extracting $fit from s(prev, speaker, bs='re') yields what look like both random intercepts and slopes. Which of these is which? Are random intercepts for all speakers listed then random slopes, are they listed as pairs per speaker? Looking at the code of plot.gam it seems like these $fit values are a function of the random effects matrix, but I'm not sure exactly how they follow.

I also want to include a random smooth by speaker to account for aspects of each unique interview that could account for changes in choices by each speaker, as in m3. My second question is whether the random smooths include both random intercepts and slopes.

m3 = bam(obs ~ prev*log.lag + s(prev, speaker, bs='re') + s(time, by=speaker, bs="fs", m=1), data=mydata, family='binomial')

Looking at the mgcv documentation (p.171) it seems that using 'by' with the factor speaker will make the smooths subject to a centering constraint. How is this centering constraint related to random intercepts?

Thanks!

#Idea: y = [b0 + boi] + x[b1 + b1i] + s(z) + error, # # where b0i and b1i are random intercepts and slopes, and s(z) is some curvy spline library(mgcv) #Generate some fake data set.seed(123) ndat <- 900 #900 data points ngroup <- 30 #30 groups gInd <- sample(rep(1:30,length.out=900)) #Group index xVal <- runif(ndat,-1,1) #Range of x values zVal <- runif(ndat,-pi,pi) #Range of z values bo <- -1 #Intercept b1 <- 2 #Slope of x b0i <- rnorm(ndat/ngroup,0,0.5) #Random intercepts b1i <- rnorm(ndat/ngroup,0,0.5) #Random slopes s_z <- sin(zVal) #Sine curve to estimate by spline err <- rnorm(ndat,0,0.2) #Noise yVal <- (bo + b0i[gInd]) + (b1 + b1i[gInd])*xVal + s_z + err #Generate values plot(xVal,yVal) #Looks OK plot(zVal,yVal) dat <- data.frame(y=yVal,x=xVal,z=zVal,group=factor(gInd)) #Assemble into dataframe str(dat) #Help file for s(x,bs='re') # ?smooth.construct.re.smooth.spec #GAMM MODEL mod1 <- gamm(y~1+x+s(z,bs='cr'),random=list(group=~1+x),data=dat, control=list(msVerbose=TRUE)) summary(mod1$lme) #Estimates variance terms well, slope terms well summary(mod1$gam) plot(mod1$gam); curve(sin(x),-pi,pi,col='red',add=T) #Pretty good estimation of sine curve #BAM MODEL #Using "by" notation within s() mod2a <- bam(y~1+x+s(z,bs='cr')+s(group,bs='re')+s(x,by=group,bs='re'),data=dat) gam.vcomp(mod2a) #Note the number of sd terms estimated -- 1 for each group summary(mod2a) #Using paired notation mod2b <- bam(y~1+x+s(z,bs='cr')+s(group,bs='re')+s(x,group,bs='re'),data=dat) gam.vcomp(mod2b) #Fewer sd terms estimated summary(mod2b) #Compare estimates of intercepts and slopes to actual values: ranInt <- data.frame(actual=b0i,gamm=ranef(mod1$lme)[[2]][,1], bamA=coef(mod2a)[grepl('s\$group\$.',names(coef(mod2a)))], bamB=coef(mod2b)[grepl('s\$group\$.',names(coef(mod2b)))]) ranSlope <- data.frame(actual=b1i,gamm=ranef(mod1$lme)[[2]][,2], bamA=coef(mod2a)[grepl('s\$x\$.',names(coef(mod2a)))], bamB=coef(mod2b)[grepl('s\$x,group\$.',names(coef(mod2b)))]) #Good estimates of random intercept terms pairs(ranInt,upper.panel=function(x,y) {points(x,y); abline(0,1,col='red')}, lower.panel=function(x,y) text(mean(x),mean(y),round(cor(x,y),3),cex=0.1+cor(x,y)*2), main='Intercepts') #bamA slope estimates are offset by a constant value when using "by" pairs(ranSlope,upper.panel=function(x,y) {points(x,y); abline(0,1,col='red')}, lower.panel=function(x,y) text(mean(x),mean(y),round(cor(x,y),3),cex=0.1+cor(x,y)*2), main='Slopes')

Best Answer

This is an old question, but I ran into a similar situation when trying to run a gamm model using bam.

I believe what's going on is that s(prev, speaker, bs='re') in the model above is estimating a variance term for the ~prev:speaker-1 interaction (random slopes), whereas s(time, by=speaker, bs="fs", m=1) is estimating a separate slope term for each speaker. These slopes do not share a variance term, meaning that they aren't really a random effect (and aren't necessarily centred on zero).

Here's a demonstration of the differences between the two methods, using gamm to compare them. These simulated data are from a normal distribution, but this should work for whatever distribution you're using:

Best Answer

Related Solutions

Related Question