Solved – Convert nonlinear mixed model to log-linear mixed model in R (nlme)

mixed modelnonlinear regressionrregression

I want to analyze fish aquaculture data comparing growth rates by state and controlling for tank effects as a mixed effect. I will use the equation

$$
w = aL^b
$$

but I want b to be a function of state of origin. I can do this using the nlme function in the nlme package. Here is a toy data set similar to mine.

tl <- seq(100, 300, 1/5)
state <- c(rep(x = c("DE", "TX", "FL", "VA", "SC"), times = length(tl)/5), "DE")

ai <- 0.000004
bi <- 3.3

lw <- NULL
set.seed(1234)
for(i in 1:length(tl)) {
  lw[i] <- log(ai) + bi * log(tl[i]) + rnorm(n = 1, 0, 0.1)
}

w <- exp(lw)    
df <- data.frame(cbind(w, tl))
df$state <- as.factor(state)
    df$tank <- as.factor(c(rep(1:15, length(tl)/15), 1:11))

This is the code I would use:

nl <- nlme(w ~ a * tl ^ b, fixed = list(a ~ 1, b ~ state), 
           random = b ~ 1|tank, data = df, start = c(a = rep(0.0000001, times = 1), 
           b = rep(3.1, times = 5)))
summary(nl)

However, a plot of the data reveals that the error is likely to be multiplicative on the arithmetic scale (variability in w (weight) increases with increasing tl (total length)):

    plot(tl, exp(lw))

My current model using nlme is

$$
w = aL^b + \epsilon \quad \epsilon \sim N(0, \sigma^2)
$$

where b is a function of state. However, I want to make the equivalent log-linear model:

$$
w = aL^b\epsilon \quad \epsilon \sim N(0, \sigma^2)
$$

$$
\log(w) = \log(a) + b \cdot \log(L) + \log(\epsilon)
$$

If it weren't for the fact that I want b to be a function of state I could use the lme function in nlme:

lm1 <- lme(log(mass) ~ log(tl), data = df, random = ~ 1|tank)
summary(lm1)

How would I make the equivalent log-linear model to my nonlinear model except for the error distribution? Xiao et al. 2010 discuss this issue but it's for non-mixed models and for functions when a or b are a function of a factor.

http://www.esajournals.org/doi/pdf/10.1890/11-0538.1

The final part of the question would be whether I could use AIC to compare the two models or if I should just use the log-linear if the data clearly look to have that error structure?

Best Answer

This is actually pretty easy: in statistical terms, you just add an interaction between total length tl and state.

Regenerating data (with some tiny cosmetic/efficiency tweaks: you don't need the for loop):

tl <- seq(100, 300, 1/5)
state <- c(rep(x = c("DE", "TX", "FL", "VA", "SC"),
               times = length(tl)/5), "DE")
ai <- 0.000004
bi <- 3.3

set.seed(1234)
lw <- log(ai) + bi*log(tl) + rnorm(length(tl))
df <- data.frame(lw,w=exp(lw),tl,state=factor(state),
                 tank=factor(c(rep(1:15, length(tl)/15), 1:11)))

Fitting the nonlinear mixed model:

library(nlme)
nl <- nlme(w ~ a * tl ^ b, fixed = list(a ~ 1, b ~ state), 
      random = b ~ 1|tank, data = df,
      start = c(a = rep(0.0000001, times = 1), 
      b = rep(3.1, times = 5)))
fixef(nl)

Now as a linear mixed model: you don't have the intercept varying either by state or by tank in the model above: that might be inadvisable, but I've replicated it in the model below.

lmm <- lme(lw ~ 1 + log(tl):state ,
           random = ~ (log(tl)-1) |tank, data = df)
fixef(lmm)

library(ggplot2)
g0 <- ggplot(df,aes(x=tl,y=w,colour=state))+geom_point(alpha=0.3)+
     scale_y_log10()+scale_x_log10()+
     geom_line(aes(group=tank),alpha=0.3)+
     theme_bw()
nlpred <- predict(nl)
g0 + geom_line(data=transform(df,w=predict(nl)))+
     geom_line(data=transform(df,w=exp(predict(lmm))),lty=2)

enter image description here

As for comparing logged with unlogged data: the naive comparison will be wrong, but you can do this:

The adjustment to the AIC for log-transformation should be $2 \sum_i \log(y_i)$: as explained by Weiss, the adjustment to the likelihood is $d(\log(y))/dy = 1/y$, so the adjustment to the AIC is $-2 \sum \log(L) = -2 \sum(\log(1/y)) = 2 \sum \log(y)$.

Comparing negative log-likelihoods:

set.seed(101)
y <- rlnorm(100)
L1 <- -sum(dnorm(log(y),log=TRUE))+sum(log(y))
L2 <- -sum(dlnorm(y,log=TRUE))
all.equal(L1,L2)  ## TRUE