Solved – Mixed effects model for power function data

lme4-nlmemixed modelpower lawr

I have data which I suspect follows a power function over time. It is collected from several units which have different intercepts. Therefore I'd like to do a mixed model with the parameters of the power function as fixed effects and intercept as random effect. Can I do this with lme4::nlmer? If not, can I do it in another way?

To illustrate with a reproducible example, say I have this data, D:

x = 1:100
y1 = 7 + 2 * x^0.4 + rnorm(100); plot(y1, main='y1')  # unit 1
y2 = 4 + 2 * x^0.4 + rnorm(100); plot(y2, main='y2')  # unit 2
y3 = 1 + 2 * x^0.4 + rnorm(100); plot(y3, main='y3')  # unit 3
D = data.frame(y=c(y1, y2, y3), id=rep(c(1,2,3), each=100), time=rep(x, 3))
plot(y ~ time, D, main='all')  # combined data

enter image description here

I can model this as a power function with the nls function with a common intercept:

nls(y ~ k + a * time ^ b, D, start=c(a=1, b=1, k=5))

This gives me estimates like a=1.3 (it was 2), b=0.47 (it was 0.4) and k=5.4 (it was 1, 4 and 7). I'd like to do something like this:

nlmer(y ~ k + a * time ^ b + (1|id), D)  # doesn't work of course

I'm going to add more random effects later. This is just a minimal reproducible example.

Best Answer

(Disclaimer: answering my own question after fiddling around - I'm really not an expert on this so please read critically)

Step 1: create a power function, including the intercept, using the deriv function which adds the necessary properties to the object. You have to be somewhat verbose, specifying which parameters to estimate (namevec) and which parameters nlmer can play with (namevec). In this case it's sort of trivial but this is handy if you have big and complicated stuff going on with lots of internal variables that really are of no interest to nlmer when finding the optimal fit. So:

power.f = deriv(~k + a*time^b, namevec=c('k', 'a', 'b'), function.arg=c('time','k', 'a','b'))

Step 2: fit the parameters of the nonlinear model using a dependent ~ non-linear ~ fixed + random syntax, where the non-linear part is objects of the sort we just created above.

fit.nlmer = nlmer(y ~ power.f(time, k, a, b) ~ k|id, start=list(nlpars=c(k=2, a=1, b=1)), data=D)

A few comments on step 2: The latter part is the stuff you may be used to from lmer but with the exception that it won't accept intercept-only stuff (for example 1|id). So here's why you didn't just make the power.f formula ~ a*time^b and put a random intercept in the fixed-random part. Instead you "put in" a random intercept into the nonlinear function - which in this case is equivalent to a linear intercept.

Also: the start values are just you helping nlmer to start in the vicinity of the correct solution since the likelihood landscape can be way more complex and contain more traps (plateaus, local minima etc.) than a linear one. But don't care too much about being spot-on (after all, you're doing inference for a reason). I just looked at the data and put in something that was not totally off.

To see why this mixed-model effort is fruitful compared to a common-intercept model like the nls one, consider the fit (observed y versus model-predicted y):

enter image description here

Related Solutions

Solved – Confidence-intervals for conditions tested with a mixed-effects model

The DRAFT r-sig-mixed-models FAQ details (in the "Predictions and/or confidence (or prediction) intervals on predictions" section) how to obtain predictions and confidence intervals for cells in the design of a mixed effects model. The ezPredict() function in the ez package wraps the code for the lme4 case (well, obtaining predictions and variances, leaving the user to decide their own CI).

Solved – Representing repeated measurements within sample plots in linear mixed effects model in R

I would recommend getting a few resources if you're just getting started fitting mixed models. I think Mixed Effects Models and Extensions in Ecology in R (Zuur et al. 2009) is a nice (approachable) place to start. I often peruse http://glmm.wikidot.com/faq, as well. It is R specific but I always learn a lot about mixed models in general.

I think it will help you to write out your study design explicitly to make sure you are accounting for all the variation in your model.
You have 4 blocks (account for variation among blocks).
You have 5 plots in each block for 20 plots total (account for variation within blocks).
You have 2 subsamples in each plot for 40 observations total (account for within plot variation).

I had to make a fake dataset, next time please make any coding questions reproducible by providing data (possibly with dput()).

block = factor(rep(1:4, each = 10))
treatment = factor(rep(rep(1:5, each = 2), 4))
subsample = factor(rep(1:2, 20))
flux = rnorm(40, 10, 7)
clay = rep(rnorm(20, 10, 2), each = 2)

dat1 = data.frame(block, treatment, subsample, flux, clay)

Here is your first model. You account for block to block variation by using block as a fixed effect. You use the two-level variable "subsample" as a random effect.

require(nlme)

fluxlme = lme(flux ~ treatment*clay + block, random = ~1|subsample, data = dat1)
summary(fluxlme)
anova(fluxlme)

At the bottom of the summary you will see this:

Number of Observations: 40
Number of Groups: 2

This is where I check if my model reflects my design. Do you have two groups with 20 observations per group? Nope, you have 20 groups (plots) with 2 observations per group. You failed to account for one of the levels of variation in your design in your model.

Make a variable to represent your plots nested in blocks. Because treatment represents each plot within each block, you can combine the block variable and the treatment variable to make a new variable with a unique identifier for each plot. This is called explicit nesting. You can read more about implicit vs explicit nesting online, starting with the website I listed above. I've found that using explicit nesting avoids a lot of confusion and mistakes on my part.

dat1$plot = with(dat1, interaction(block, treatment) )

fluxlme2 = lme(flux ~ treatment*clay + block, random = ~1|plot, data = dat1)
summary(fluxlme2)
anova(fluxlme2)

The subsamples are the observation-level measurement, which is represented by the residual error term in linear mixed models.

Best Answer

Related Solutions

Solved – Confidence-intervals for conditions tested with a mixed-effects model

Solved – Representing repeated measurements within sample plots in linear mixed effects model in R

Related Question