Solved – Convert nonlinear mixed model to log-linear mixed model in R (nlme)

mixed modelnonlinear regressionrregression

I want to analyze fish aquaculture data comparing growth rates by state and controlling for tank effects as a mixed effect. I will use the equation

$$
w = aL^b
$$

but I want b to be a function of state of origin. I can do this using the nlme function in the nlme package. Here is a toy data set similar to mine.

tl <- seq(100, 300, 1/5)
state <- c(rep(x = c("DE", "TX", "FL", "VA", "SC"), times = length(tl)/5), "DE")

ai <- 0.000004
bi <- 3.3

lw <- NULL
set.seed(1234)
for(i in 1:length(tl)) {
  lw[i] <- log(ai) + bi * log(tl[i]) + rnorm(n = 1, 0, 0.1)
}

w <- exp(lw)    
df <- data.frame(cbind(w, tl))
df$state <- as.factor(state)
    df$tank <- as.factor(c(rep(1:15, length(tl)/15), 1:11))

This is the code I would use:

nl <- nlme(w ~ a * tl ^ b, fixed = list(a ~ 1, b ~ state), 
           random = b ~ 1|tank, data = df, start = c(a = rep(0.0000001, times = 1), 
           b = rep(3.1, times = 5)))
summary(nl)

However, a plot of the data reveals that the error is likely to be multiplicative on the arithmetic scale (variability in w (weight) increases with increasing tl (total length)):

    plot(tl, exp(lw))

My current model using nlme is

$$
w = aL^b + \epsilon \quad \epsilon \sim N(0, \sigma^2)
$$

where b is a function of state. However, I want to make the equivalent log-linear model:

$$
w = aL^b\epsilon \quad \epsilon \sim N(0, \sigma^2)
$$

$$
\log(w) = \log(a) + b \cdot \log(L) + \log(\epsilon)
$$

If it weren't for the fact that I want b to be a function of state I could use the lme function in nlme:

lm1 <- lme(log(mass) ~ log(tl), data = df, random = ~ 1|tank)
summary(lm1)

How would I make the equivalent log-linear model to my nonlinear model except for the error distribution? Xiao et al. 2010 discuss this issue but it's for non-mixed models and for functions when a or b are a function of a factor.

http://www.esajournals.org/doi/pdf/10.1890/11-0538.1

The final part of the question would be whether I could use AIC to compare the two models or if I should just use the log-linear if the data clearly look to have that error structure?

Best Answer

This is actually pretty easy: in statistical terms, you just add an interaction between total length tl and state.

Regenerating data (with some tiny cosmetic/efficiency tweaks: you don't need the for loop):

tl <- seq(100, 300, 1/5)
state <- c(rep(x = c("DE", "TX", "FL", "VA", "SC"),
               times = length(tl)/5), "DE")
ai <- 0.000004
bi <- 3.3

set.seed(1234)
lw <- log(ai) + bi*log(tl) + rnorm(length(tl))
df <- data.frame(lw,w=exp(lw),tl,state=factor(state),
                 tank=factor(c(rep(1:15, length(tl)/15), 1:11)))

Fitting the nonlinear mixed model:

library(nlme)
nl <- nlme(w ~ a * tl ^ b, fixed = list(a ~ 1, b ~ state), 
      random = b ~ 1|tank, data = df,
      start = c(a = rep(0.0000001, times = 1), 
      b = rep(3.1, times = 5)))
fixef(nl)

Now as a linear mixed model: you don't have the intercept varying either by state or by tank in the model above: that might be inadvisable, but I've replicated it in the model below.

lmm <- lme(lw ~ 1 + log(tl):state ,
           random = ~ (log(tl)-1) |tank, data = df)
fixef(lmm)

library(ggplot2)
g0 <- ggplot(df,aes(x=tl,y=w,colour=state))+geom_point(alpha=0.3)+
     scale_y_log10()+scale_x_log10()+
     geom_line(aes(group=tank),alpha=0.3)+
     theme_bw()
nlpred <- predict(nl)
g0 + geom_line(data=transform(df,w=predict(nl)))+
     geom_line(data=transform(df,w=exp(predict(lmm))),lty=2)

enter image description here

As for comparing logged with unlogged data: the naive comparison will be wrong, but you can do this:

Jack Weiss's lectures are very good although rather technical.
There is a question along these lines, but I think the answer is actually wrong
Brian Ripley points out on R-help that it is "quite easy" to work out the adjustment for the transformation.

The adjustment to the AIC for log-transformation should be $2 \sum_i \log(y_i)$: as explained by Weiss, the adjustment to the likelihood is $d(\log(y))/dy = 1/y$, so the adjustment to the AIC is $-2 \sum \log(L) = -2 \sum(\log(1/y)) = 2 \sum \log(y)$.

Comparing negative log-likelihoods:

set.seed(101)
y <- rlnorm(100)
L1 <- -sum(dnorm(log(y),log=TRUE))+sum(log(y))
L2 <- -sum(dlnorm(y,log=TRUE))
all.equal(L1,L2)  ## TRUE

Related Solutions

Solved – Estimating parameters of a dynamic linear model

If you have time varying parameters and want to do things sequentially (filtering), then SMC makes the most sense. MCMC is better when you want to condition on all of the data, or you have unknown static parameters that you want to estimate. Particle filters have issues with static parameters (degeneracy).

Solved – Calculate log-likelihood “by hand” for generalized nonlinear least squares regression (nlme)

Let's start with the simpler case where there is no correlation structure for the residuals:

fit <- gnls(model=model,data=data,start=start)
logLik(fit)

The log likelihood can then be easily computed by hand with:

N <- fit$dims$N
p <- fit$dims$p
sigma <- fit$sigma * sqrt((N-p)/N)
sum(dnorm(y, mean=fitted(fit), sd=sigma, log=TRUE))

Since the residuals are independent, we can just use dnorm(..., log=TRUE) to get the individual log likelihood terms (and then sum them up). Alternatively, we could use:

sum(dnorm(resid(fit), mean=0, sd=sigma, log=TRUE))

Note that fit$sigma is not the "less biased estimate of $\sigma^2$" -- so we need to make the correction manually first.

Now for the more complicated case where the residuals are correlated:

fit <- gnls(model=model,data=data,start=start,correlation=correlation)
logLik(fit)

Here, we need to use the multivariate normal distribution. I am sure there is a function for this somewhere, but let's just do this by hand:

N <- fit$dims$N
p <- fit$dims$p
yhat <- cbind(fitted(fit))
R <- vcv(tree, cor=TRUE)
sigma <- fit$sigma * sqrt((N-p)/N)
S <- diag(sigma, nrow=nrow(R)) %*% R %*% diag(sigma, nrow=nrow(R))
-1/2 * log(det(S)) - 1/2 * t(y - yhat) %*% solve(S) %*% (y - yhat) - N/2 * log(2*pi)

Best Answer

Related Solutions

Solved – Estimating parameters of a dynamic linear model

Solved – Calculate log-likelihood “by hand” for generalized nonlinear least squares regression (nlme)

Related Question