Solved – gam models with random effect R

gamm4generalized-additive-modelpredictionrandom-effects-model

I am modeling fishery CPUE as a function of a number of a number of covariates using a GAM approach that includes fixed and random effects.

I understand that there are limitations with regards to predicting random effects (predict function only addresses fixed effects) with gamm4. How does the predict function in the basic gam (mgcv, using bs="re") deal with the random effects? Are they included in predictions? Any thoughts would be much appreciated…

Best Answer

Yes, they are included, but only ever for the observed levels of the random factor. You can turn this using the by variable smooth trick however.

Consider the following example taken from ?gam.models:

dat <- gamSim(1,n=400,scale=2) ## simulate 4 term additive truth

## Now add some random effects to the simulation. Response is 
## grouped into one of 20 groups by `fac' and each groups has a
## random effect added....
fac <- as.factor(sample(1:20,400,replace=TRUE))
dat$X <- model.matrix(~fac-1)                                  #$ rendering bug
b <- rnorm(20)*.5
dat$y <- dat$y + dat$X%*%b

rm1 <- gam(y ~ s(fac, bs="re") + s(x0) + s(x1) + s(x2) + s(x3),
           data = dat, method = "ML")

Now lets get the additive term contributions from the model and compare them with the full blown model predictions:

p <- predict(rm1, type = "terms")
head(rowSums(p) + attr(p, "constant"))
head(predict(rm1, type = "response"))

which gives

> head(rowSums(p) + attr(p, "constant"))
        1         2         3         4         5         6 
14.265260  6.433342  2.766193 12.864771  5.296381  7.341790 
> head(predict(rm1, type = "response"))
        1         2         3         4         5         6 
14.265260  6.433342  2.766193 12.864771  5.296381  7.341790

So we are convinced now that the two ways of generating the predicted values are equivalent. now look at p the additive term contributions to the fitted values:

> head(p)
       s(fac)      s(x0)     s(x1)      s(x2)        s(x3)
1 -0.03786017 -0.1683648  3.868927  2.6485134  0.157054343
2  0.21328630  0.5304765 -1.902366 -0.1972856 -0.007759325
3 -0.36501307  0.1058000 -1.661677 -3.0955348 -0.014372627
4 -0.12519987  0.5474540  2.554656  2.2189534 -0.128083342
5 -0.12519987 -0.3720668 -1.817144 -0.1451364 -0.041061989
6 -0.05481148  0.3490905 -1.216908  0.3411783  0.126251294

The first column is the s(fac) which was a random effect spline in the fitted GAM.

I will add that the gamm() function also in mgcv can give the within-group predictions (fitted values):

m2 <- gamm(y ~ s(x0) + s(x1) + s(x2) + s(x3),
           data = dat, method = "ML",
           random = list(fac = ~ 1))

head(predict(m2$lme))

> head(predict(m2$lme))
1/1/1/1/14  1/1/1/1/6 1/1/1/1/19  1/1/1/1/9  1/1/1/1/9 1/1/1/1/15 
 14.265259   6.433345   2.766196  12.864770   5.296383   7.341790 
> head(predict(rm1, type = "response"))
        1         2         3         4         5         6 
14.265260  6.433342  2.766193 12.864771  5.296381  7.341790

Related Solutions

Solved – Predicting with GAM, using an offset

First, from the help-page of gam (bold font added by me):

offset: Can be used to supply a model offset for use in fitting. Note that this offset will always be completely ignored when predicting, unlike an offset included in formula: this conforms to the behaviour of lm and glm.

So for predicting, you should use the formula-specification. Further, if you specify your offset as an argument, rather than in the formula, you should use an equal sign (=):

mod2<-gam(Y ~ covariate1 + covariate2 + covariate3 + covariate4, offset=log(sampled area), family=quasipoisson)

This should give the exact same result as this specification:

mod1<-gam(Y ~ offset(log(sampled area))+ covariate1 + covariate2 + covariate3 + covariate4, family=quasipoisson)

I honestly don't know what R caluclates if you specify the offset inside the brackets, like offset(sampled area).

Hope that helps.

Solved – Using a gamm4 model to predict estimates in new data

I'm not sure what you want here.

Have you looked at ?predict.gam

# Load the gamm4 package
library(gamm4)

# Using gamm4's built-in data simulation capabilities to give us some data:
set.seed(100) 
dat <- gamSim(6, n=100, scale=2)

# Fitting a model and plotting it:
mod <- gamm4(y~s(x0)+s(x1)+s(x2), data=dat, random = ~ (1|fac))
plot(mod$gam, pages=1)

# Generating some new data for which you'd like predictions:
newdat <- data.frame(x0 = runif(100), x1 = runif(100), x2 = runif(100)) 

# Getting predicted outcomes for new data
# These include the splines but ignore other REs
predictions = predict(mod$gam, newdata=newdat, se.fit = TRUE)

# Consolidating new data and predictions
newdat = cbind(newdat, predictions)

# If you want CIs 
newdat <- within(newdat, {
    lower = fit-1.96*se.fit
    upper = fit+1.96*se.fit
})

# Plot, for example, the predicted outcomes as a function of x1...
library(ggplot2)
egplot <- ggplot(newdat, aes(x=x1, y=fit)) + 
          geom_smooth() + geom_point()
egplot

See here for some possible assistance

Best Answer

Related Solutions

Solved – Predicting with GAM, using an offset

Solved – Using a gamm4 model to predict estimates in new data

Related Question