Mixed Model – How to Calculate Likelihood of Out-of-Sample Values for a Mixed Effects Model

likelihoodmixed modelrregression

I'm trying to use this method for calculating the Information Coefficient using bootstrapping. The advantage of using bootstrapping is that I can compare models that are not nested. But to do this, I need to be able to calculating the likelihood of out-of-sample data (because I'm bootstrapping).

I have tried several different methods, which give me wildly different results. This is easiest to illustrate when calculating the log-likelihood for the in-sample data. The easiest option is to use logLik:

    data(Orthodont,package="MEMSS")
    mod<-lmer(distance~age+(1+age|Subject), data=Orthodont)
    logLik(mod)

    > -221.3183. 

But I get a different result using the residuals:

    resid<-residuals(mod)
    sum(dnorm(resid,sd=sd(resid),log=TRUE))

    > -162.1903

I also tried using the residual variance given by lmer:

    sum(dnorm(resid,sd=sigma(mod),log=TRUE))

    > -165.5434

I know that log-likelihood is sometimes calculated by integrating over values for the parameters, whereas by using residuals, I am conditioning on the point-estimates for the parameters. However, according to the help for logLik.merMod, logLik returns "log-likelihood at the fitted value of the parameters." I think that means they are conditioning on the point-estimates.

Just to be sure, I tried estimating the unconditioned log-likelihood. By using predict with re.form=NA, you can retrieve the fitted values based on fixed effects only (ignoring random effects).

    resid<-Orthodont$distance-predict(mod,newdata=Orthodont,re.form=NA)
    sum(dnorm(resid,sd=sd(resid),log=TRUE))

    > -252.7908

Interestingly, all of the above methods give roughly the same answer when using glm. So this seems to be specific to mixed effects models.

Best Answer

It seems that calculating log-likelihood for mixed effects models requires dealing with the covariance of error terms for the random effects. Here is a method for calculating log-likelihood by hand for both ML and REML:

data(Orthodont, package="MEMSS")

y <- Orthodont$distance
n <- nrow(Orthodont)    

mod <- lmer(distance ~ age + (1+age|Subject), data=Orthodont, REML=FALSE)
logLik(mod)

G <- diag(attr(VarCorr(mod)$Subject, "stddev")) %*% attr(VarCorr(mod)$Subject, "correlation") %*% diag(attr(VarCorr(mod)$Subject, "stddev"))
V <- lapply(split(Orthodont, Orthodont$Subject), function(x) cbind(1, x$age) %*% G %*% rbind(1, x$age) + diag(rep(sigma(mod)^2, nrow(x))))
V <- as.matrix(bdiag(V))
W <- solve(V)
X <- cbind(1, Orthodont$age)
b <- fixef(mod)

dmvnorm(y, mean = X %*% b, sigma=V, log=TRUE)
c(-n/2 * log(2*pi) - 1/2 * log(det(V)) - 1/2 * t(y - X %*% b) %*% W %*% (y - X %*% b))

mod <- lmer(distance~age+(1+age|Subject), data=Orthodont, REML=TRUE)
logLik(mod)

G <- diag(attr(VarCorr(mod)$Subject, "stddev")) %*% attr(VarCorr(mod)$Subject, "correlation") %*% diag(attr(VarCorr(mod)$Subject, "stddev"))
V <- lapply(split(Orthodont, Orthodont$Subject), function(x) cbind(1, x$age) %*% G %*% rbind(1, x$age) + diag(rep(sigma(mod)^2, nrow(x))))
V <- as.matrix(bdiag(V))
W <- solve(V)
X <- cbind(1, Orthodont$age)
b <- fixef(mod)
p <- length(b)

c(-(n-p)/2 * log(2*pi) - 1/2 * log(det(V)) - 1/2 * log(det(t(X) %*% W %*% X)) - 1/2 * t(y - X %*% b) %*% W %*% (y - X %*% b))

To calculated the likelihood for a new datapoint (or, more accurately, calculate the density for that new datapoint) using ML, given X and Y for the new subjects, compute V based on G and then calculate:

dmvnorm(y, mean = X %*% b, sigma=V, log=TRUE)
c(-n/2 * log(2*pi) - 1/2 * log(det(V)) - 1/2 * t(y - X %*% b) %*% W %*% (y - X %*% b))

It's not clear this can be done using REML, since the - 1/2 * log(det(t(X) %% W %% X)) term cannot be decomposed into the contribution of each individual subject.

Many thanks to several experts who answered questions via email.

Related Question