Solved – Using a gamm4 model to predict estimates in new data

gamm4generalized-additive-modellme4-nlmer

I have been experimenting with gamm4 to derive GAMMs of some repeated measures data.

The models looks very nice and seem to give more flexibility than my LMMs.

Ultimately I want to compare models not by the quality of their fit (also the reality of comparing LMM and GAMM fits seems complex?), but in the quality of their predictions in new data sets, and in simulated new data by MCMC.

With LMMs I predict using the fixed effects only using:

mm <- model.matrix(terms(lmer),newdata)

newdata$predicted <- mm %*% fixef(lmer)

This is fine since we are predicting in new individuals, with new independent random effects.

I cannot get this predict method to work with gamm4.

> mm <- model.matrix(terms(gamm4$mer), newdata)

Error in model.frame.default(object, data, xlev = xlev) : 
  variable lengths differ (found for 'X')

I think this is because the GAM process creates new variables in order to transform predictor variables.
It is also complex, because I believe the transforms are stored as random effects, so I would need to extract these random effects, but not the “individual level” random effects.

Does anyone know how I can:

Extract just the transform effect terms from the gamm4 model?
Make predictions into new data using gamm4?
Extract the model specification of the GAMM so I could implement it as a standalone algorithm?
General advice?

Best Answer

I'm not sure what you want here.

Have you looked at ?predict.gam

# Load the gamm4 package
library(gamm4)

# Using gamm4's built-in data simulation capabilities to give us some data:
set.seed(100) 
dat <- gamSim(6, n=100, scale=2)

# Fitting a model and plotting it:
mod <- gamm4(y~s(x0)+s(x1)+s(x2), data=dat, random = ~ (1|fac))
plot(mod$gam, pages=1)

# Generating some new data for which you'd like predictions:
newdat <- data.frame(x0 = runif(100), x1 = runif(100), x2 = runif(100)) 

# Getting predicted outcomes for new data
# These include the splines but ignore other REs
predictions = predict(mod$gam, newdata=newdat, se.fit = TRUE)

# Consolidating new data and predictions
newdat = cbind(newdat, predictions)

# If you want CIs 
newdat <- within(newdat, {
    lower = fit-1.96*se.fit
    upper = fit+1.96*se.fit
})

# Plot, for example, the predicted outcomes as a function of x1...
library(ggplot2)
egplot <- ggplot(newdat, aes(x=x1, y=fit)) + 
          geom_smooth() + geom_point()
egplot

See here for some possible assistance

Related Solutions

Mixed Model – Getting Fixed-Effect Only Predictions on New Data in R

Maybe this is cheating, but

fixedformula <- as.formula(lme4.0:::nobars(formula(m))[-2])
model.matrix(fixedformula,newdata=cake2)

note:

I am using lme4.0 here, which is the r-forge version of "old" (CRAN) lme4: you can substitute lme4 for lme4.0 in the code above
the new (r-forge/development) version of lme4 has a predict method: in that case
```
predict(m,re.form=NA,newdata=cake2)
```

works fine (re.form=NA sets all random effects to zero, equivalent to level=0 in the old predict.lme)

Solved – Addressing “NOTE: Results may be misleading due to involvement in interactions” warning with Tukey post-hoc comparisons in lsmeans R package

My view is that the $F$ test of statistical significance of the interaction effect is less important than the subjective nature of the interaction, as evidenced by the plot. The plot tells me that it is reasonably sensible to compare the overall averages of Depression and Top, but it'd be silly to compare those averages with the overall average of Slope -- whether or not these comparisons are statistically significant. Basically, I'd say to avoid doing comparisons that don't make sense -- so my advice is do not ignore the warning note in this case. If the curve for Top were fairly parallel with the other two, that's when you could ignore it.

In general, I suggest looking at enough plots that you can tell what's going on, and then restrict your post-hoc testing to things that are sensible.

Since P is continuous, you're really fitting straight lines (they look curved because you chose unequally spaced points). You can compare the slopes of these lines:

R> lstrends(Dens.LMER, pairwise ~ Contour, var = "P")

$lstrends
 Contour        P.trend          SE    df    lower.CL     upper.CL
 Depression -0.00681143 0.004901195 39.68 -0.01671957  0.003096714
 Slope      -0.03376293 0.010533875 41.88 -0.05502295 -0.012502911
 Top        -0.01306992 0.010499548 41.97 -0.03425936  0.008119525

Confidence level used: 0.95 

$contrasts
 contrast               estimate         SE    df t.ratio p.value
 Depression - Slope  0.026951501 0.01161827 42.00   2.320  0.0639
 Depression - Top    0.006258486 0.01158716 41.81   0.540  0.8520
 Slope - Top        -0.020693015 0.01487290 41.99  -1.391  0.3545

P value adjustment: tukey method for a family of 3 tests

The comparison between the shallowest and largest slopes has an adjusted $P$ value of about $.06$.

Best Answer

Related Solutions

Mixed Model – Getting Fixed-Effect Only Predictions on New Data in R

Solved – Addressing “NOTE: Results may be misleading due to involvement in interactions” warning with Tukey post-hoc comparisons in lsmeans R package

Related Question