Solved – Calculate log-likelihood “by hand” for generalized nonlinear least squares regression (nlme)

least squaresmaximum likelihoodmixed modelnonlinear regressionr

I'm trying to calculate the log-likelihood for a generalized nonlinear least squares regression for the function $f(x)=\frac{\beta_1}{(1+\frac x\beta_2)^{\beta_3}}$ optimized by the gnls function in the R package nlme, using the variance covariance matrix generated by distances on a a phylogenetic tree assuming Brownian motion (corBrownian(phy=tree) from the ape package). The following reproducible R code fits the gnls model using x,y data and a random tree with 9 taxa:

require(ape)
require(nlme)
require(expm)
tree <- rtree(9)
x <- c(0,14.51,32.9,44.41,86.18,136.28,178.21,262.3,521.94)
y <- c(100,93.69,82.09,62.24,32.71,48.4,35.98,15.73,9.71)
data <- data.frame(x,y,row.names=tree$tip.label)
model <- y~beta1/((1+(x/beta2))^beta3)
f=function(beta,x) beta[1]/((1+(x/beta[2]))^beta[3])
start <- c(beta1=103.651004,beta2=119.55067,beta3=1.370105)
correlation <- corBrownian(phy=tree)
fit <- gnls(model=model,data=data,start=start,correlation=correlation)
logLik(fit) 

I would like to calculate the log-likelihood "by hand" (in R, but without use of the logLik function) based on the estimated parameters obtained from gnls so it matches the output from logLik(fit). NOTE: I am not trying to estimate parameters; I just want to calculate log-likelihood of the parameters estimated by the gnls function (although if someone has a reproducible example of how to estimate parameters without gnls, I would be very interested in seeing it!).

I'm not really sure how to go about doing this in R. The linear algebra notation described in Mixed-Effects Models in S and S-Plus (Pinheiro and Bates) is very much over my head and none of my attempts have matched logLik(fit). Here are the details described by Pinheiro and Bates:

The log-likelihood for the generalized nonlinear least squares model $y_i=f_i(\phi_i,v_i)+\epsilon_i$ where $\phi_i=A_i\beta$ is calculated as follows:

$l(\beta,\sigma^2,\delta|y)=-\frac 12 \Bigl\{ N\log(2\pi\sigma^2)+\sum\limits_{i=1}^M{\Bigl[\frac{||y_i^*-f_i^*(\beta)||^2}{\sigma^2}+\log|\Lambda_i|\Bigl]\Bigl\}}$

where $N$ is the number of observations, and $f_i^*(\beta)=f_i^*(\phi_i,v_i)$.

$\Lambda_i$ is positive-definite, $y_i^*=\Lambda_i^{-T/2}y_i$ and $f_i^*(\phi_i,v_i)=\Lambda_i^{-T/2}f_i(\phi_i,v_i)$

For fixed $\beta$ and $\lambda$, the ML estimator of $\sigma^2$ is

$\hat\sigma(\beta,\lambda)=\sum\limits_{i=1}^M||y_i^*-f_i^*(\beta)||^2 / N$

and the profiled log-likelihood is

$l(\beta,\lambda|y)=-\frac12\Bigl\{N[\log(2\pi/N)+1]+\log\Bigl(\sum\limits_{i=1}^M||y_i^*-f_i^*(\beta)||^2\Bigl)+\sum\limits_{i=1}^M\log|\Lambda_i|\Bigl\}$

which is used with a Gauss-Seidel algorithm to find the ML estimates of $\beta$ and $\lambda$. A less biased estimate of $\sigma^2$ is used:

$\sigma^2=\sum\limits_{i=1}^M\Bigl|\Bigl|\hat\Lambda_i^{-T/2}[y_i-f_i(\hat\beta)]\Bigl|\Bigl|^2/(N-p)$

where $p$ represents the length of $\beta$.

I have compiled a list of specific questions that I am facing:

  1. What is $\Lambda_i$? Is it the distance matrix produced by big_lambda <- vcv.phylo(tree) in ape, or does it need to be somehow transformed or parameterized by $\lambda$, or something else entirely?
  2. Would $\sigma^2$ be fit$sigma^2, or the equation for the less biased estimate (the last equation in this post)?
  3. Is it necessary to use $\lambda$ to calculate log-likelihood, or is that just an intermediate step for parameter estimation? Also, how is $\lambda$ used? Is it a single value or a vector, and is it multiplied by all of $\Lambda_i$ or just off-diagonal elements, etc.?
  4. What is $||y-f(\beta)||$? Would that be norm(y-f(fit$coefficients,x),"F") in the package Matrix? If so, I'm confused about how to calculate the sum $\sum\limits_{i=1}^M||y_i^*-f_i^*(\beta)||^2$, because norm() returns a single value, not a vector.
  5. How does one calculate $\log|\Lambda_i|$? Is it log(diag(abs(big_lambda))) where big_lambda is $\Lambda_i$, or is it logm(abs(big_lambda)) from the package expm? If it is logm(), how does one take the sum of a matrix (or is it implied that it is just the diagonal elements)?
  6. Just to confirm, is $\Lambda_i^{-T/2}$ calculated like this: t(solve(sqrtm(big_lambda)))?
  7. How are $y_i^*$ and $f_i^*(\beta)$ calculated? Is it either of the following:

y_star <- t(solve(sqrtm(big_lambda))) %*% y

and

f_star <- t(solve(sqrtm(big_lambda))) %*% f(fit$coefficients,x)

or would it be

y_star <- t(solve(sqrtm(big_lambda))) * y

and

f_star <- t(solve(sqrtm(big_lambda))) * f(fit$coefficients,x) ?

If all of these questions are answered, in theory, I think the log-likelihood should be calculable to match the output from logLik(fit). Any help on any of these questions would be greatly appreciated. If anything needs clarification, please let me know. Thanks!

UPDATE: I have been experimenting with various possibilities for the calculation of the log-likelihood, and here is the best I have come up with so far. logLik_calc is consistently about 1 to 3 off from the value returned by logLik(fit). Either I'm close to the actual solution, or this is purely by coincidence. Any thoughts?

  C <- vcv.phylo(tree) # variance-covariance matrix
  tC <- t(solve(sqrtm(C))) # C^(-T/2)
  log_C <- log(diag(abs(C))) # log|C|
  N <- length(y)
  y_star <- tC%*%y 
  f_star <- tC%*%f(fit$coefficients,x)
  dif <- y_star-f_star  
  sigma_squared <-  sum(abs(y_star-f_star)^2)/N
  # using fit$sigma^2 also produces a slightly different answer than logLik(fit)
  logLik_calc <- -((N*log(2*pi*(sigma_squared)))+
       sum(((abs(dif)^2)/(sigma_squared))+log_C))/2

Best Answer

Let's start with the simpler case where there is no correlation structure for the residuals:

fit <- gnls(model=model,data=data,start=start)
logLik(fit)

The log likelihood can then be easily computed by hand with:

N <- fit$dims$N
p <- fit$dims$p
sigma <- fit$sigma * sqrt((N-p)/N)
sum(dnorm(y, mean=fitted(fit), sd=sigma, log=TRUE))

Since the residuals are independent, we can just use dnorm(..., log=TRUE) to get the individual log likelihood terms (and then sum them up). Alternatively, we could use:

sum(dnorm(resid(fit), mean=0, sd=sigma, log=TRUE))

Note that fit$sigma is not the "less biased estimate of $\sigma^2$" -- so we need to make the correction manually first.

Now for the more complicated case where the residuals are correlated:

fit <- gnls(model=model,data=data,start=start,correlation=correlation)
logLik(fit)

Here, we need to use the multivariate normal distribution. I am sure there is a function for this somewhere, but let's just do this by hand:

N <- fit$dims$N
p <- fit$dims$p
yhat <- cbind(fitted(fit))
R <- vcv(tree, cor=TRUE)
sigma <- fit$sigma * sqrt((N-p)/N)
S <- diag(sigma, nrow=nrow(R)) %*% R %*% diag(sigma, nrow=nrow(R))
-1/2 * log(det(S)) - 1/2 * t(y - yhat) %*% solve(S) %*% (y - yhat) - N/2 * log(2*pi)