Mixed Model – How to Calculate Likelihood of Out-of-Sample Values for a Mixed Effects Model

likelihoodmixed modelrregression

I'm trying to use this method for calculating the Information Coefficient using bootstrapping. The advantage of using bootstrapping is that I can compare models that are not nested. But to do this, I need to be able to calculating the likelihood of out-of-sample data (because I'm bootstrapping).

I have tried several different methods, which give me wildly different results. This is easiest to illustrate when calculating the log-likelihood for the in-sample data. The easiest option is to use logLik:

    data(Orthodont,package="MEMSS")
    mod<-lmer(distance~age+(1+age|Subject), data=Orthodont)
    logLik(mod)

    > -221.3183.

But I get a different result using the residuals:

    resid<-residuals(mod)
    sum(dnorm(resid,sd=sd(resid),log=TRUE))

    > -162.1903

I also tried using the residual variance given by lmer:

    sum(dnorm(resid,sd=sigma(mod),log=TRUE))

    > -165.5434

I know that log-likelihood is sometimes calculated by integrating over values for the parameters, whereas by using residuals, I am conditioning on the point-estimates for the parameters. However, according to the help for logLik.merMod, logLik returns "log-likelihood at the fitted value of the parameters." I think that means they are conditioning on the point-estimates.

Just to be sure, I tried estimating the unconditioned log-likelihood. By using predict with re.form=NA, you can retrieve the fitted values based on fixed effects only (ignoring random effects).

    resid<-Orthodont$distance-predict(mod,newdata=Orthodont,re.form=NA)
    sum(dnorm(resid,sd=sd(resid),log=TRUE))

    > -252.7908

Interestingly, all of the above methods give roughly the same answer when using glm. So this seems to be specific to mixed effects models.

Best Answer

It seems that calculating log-likelihood for mixed effects models requires dealing with the covariance of error terms for the random effects. Here is a method for calculating log-likelihood by hand for both ML and REML:

data(Orthodont, package="MEMSS")

y <- Orthodont$distance
n <- nrow(Orthodont)    

mod <- lmer(distance ~ age + (1+age|Subject), data=Orthodont, REML=FALSE)
logLik(mod)

G <- diag(attr(VarCorr(mod)$Subject, "stddev")) %*% attr(VarCorr(mod)$Subject, "correlation") %*% diag(attr(VarCorr(mod)$Subject, "stddev"))
V <- lapply(split(Orthodont, Orthodont$Subject), function(x) cbind(1, x$age) %*% G %*% rbind(1, x$age) + diag(rep(sigma(mod)^2, nrow(x))))
V <- as.matrix(bdiag(V))
W <- solve(V)
X <- cbind(1, Orthodont$age)
b <- fixef(mod)

dmvnorm(y, mean = X %*% b, sigma=V, log=TRUE)
c(-n/2 * log(2*pi) - 1/2 * log(det(V)) - 1/2 * t(y - X %*% b) %*% W %*% (y - X %*% b))

mod <- lmer(distance~age+(1+age|Subject), data=Orthodont, REML=TRUE)
logLik(mod)

G <- diag(attr(VarCorr(mod)$Subject, "stddev")) %*% attr(VarCorr(mod)$Subject, "correlation") %*% diag(attr(VarCorr(mod)$Subject, "stddev"))
V <- lapply(split(Orthodont, Orthodont$Subject), function(x) cbind(1, x$age) %*% G %*% rbind(1, x$age) + diag(rep(sigma(mod)^2, nrow(x))))
V <- as.matrix(bdiag(V))
W <- solve(V)
X <- cbind(1, Orthodont$age)
b <- fixef(mod)
p <- length(b)

c(-(n-p)/2 * log(2*pi) - 1/2 * log(det(V)) - 1/2 * log(det(t(X) %*% W %*% X)) - 1/2 * t(y - X %*% b) %*% W %*% (y - X %*% b))

To calculated the likelihood for a new datapoint (or, more accurately, calculate the density for that new datapoint) using ML, given X and Y for the new subjects, compute V based on G and then calculate:

dmvnorm(y, mean = X %*% b, sigma=V, log=TRUE)
c(-n/2 * log(2*pi) - 1/2 * log(det(V)) - 1/2 * t(y - X %*% b) %*% W %*% (y - X %*% b))

It's not clear this can be done using REML, since the - 1/2 * log(det(t(X) %% W %% X)) term cannot be decomposed into the contribution of each individual subject.

Many thanks to several experts who answered questions via email.

Related Solutions

Generalized Linear Model – Likelihood and Estimates for Mixed Effects Logistic Regression

I did not see how "the vectors won't be of the same length", please clarify your question.

First of all, for the integral with dimension less than 4, the direct numerical methods like quadrature are more efficient than MCMC. I studied these questions for a while, and I would be happy to discuss this problem with you.

For mixed-effects logistic regression, the only explicit R code I have found is from Prof. Demidenko's book, Mixed Models: Theory and Applications, you can download the code via the column of "SOFTWARE AND DATA" on the webpage. The logMLEgh() can be found in \mixed_models_data.zip\MixedModels\Chapter07. He did not use the statmod package to obtain the quadratures, but wrote his own function gauher(). There are some minor errors in the code and I have discussed them with the author, but it is still very helpful to start from his code and book. I can provide the corrected version if needed.

Another issue is that, if you want to get accurate estimates, optim() is not enough, you may need to use methods like Fisher scoring, as in glm().

Model Selection in Longitudinal Data – Testing the Need for Random-Effects Terms in Longitudinal Data Analysis

The likelihood ratio test is slightly incorrect (in general, conservative) for testing the significance of a random effect, because the null value ($\sigma^2=0$) is at the boundary of the feasible space, but in this case there is overwhelmingly strong evidence against the null hypothesis. The model with random effects of individual is 15713-6772=8941 log-likelihood units better; twice the log-likelihood value is $\chi^2$ distributed, so the direct p-value calculation would give you ...

pchisq(2*8941,df=1,lower.tail=FALSE,log.p=TRUE)/log(10)
## -3885.251

... a p-value of approximately $10^{-3885}$.

You should really consider a random-slope model (random = ~time|id) as well.

Update: relative to the random-intercept model, the random-slopes model is again much better. The improvement is now 935 log-likelihood units, which doing the equivalent calculation as above corresponds to a rejection of the null hypothesis (among-individual variation in slope is equal to zero) with a p-value of "only" $10^{-408}$.

Best Answer

Related Solutions

Generalized Linear Model – Likelihood and Estimates for Mixed Effects Logistic Regression

Model Selection in Longitudinal Data – Testing the Need for Random-Effects Terms in Longitudinal Data Analysis

Related Question