Solved – Likelihood and estimates for mixed effects Logistic regression

generalized linear modelglmmmaximum likelihoodmixed modelnumerical integration

First let's simulate some data for a logistic regression with fixed and random parts:

set.seed(1)
n    <- 100
x    <- runif(n)
z    <- sample(c(0,1), n, replace=TRUE)
b    <- rnorm(2)
beta <- c(0.4, 0.8)
X    <- model.matrix(~x)
Z    <- cbind(z, 1-z)
eta  <- X%*%beta + Z%*%b
pr   <- 1/(1+exp(-eta))
y    <- rbinom(n, 1, pr)

If we just wanted to fit a Logistic regression with no random parts, we could use the glm function:

glm(y~x, family="binomial")

glm(y~x, family="binomial")$coefficients
# (Intercept)           x 
#  -0.2992785   2.1429825

Or constructing our own function of the log-likelihood

$$ \log\text{L}(\boldsymbol{\beta}) = \sum_{i=1}^n y_i \log \Lambda(\eta_i) + (1-y_i)\log(1-\Lambda(\eta_i)) $$

where $\Lambda(\eta_i)=\frac{1}{1+\exp(-\eta_i)}$ and $\eta_i=\boldsymbol{X_i' \beta}$
and use optim() to estimate the parameters that maximize it, as in the following example code:

ll.no.random <- function(theta,X,y){
  beta <- theta[1:ncol(X)]
  eta  <- X%*%beta
  p    <- 1/(1+exp(-eta))
  ll   <- sum( y*log(p) + (1-y)*log(1-p) )
  -ll
}

optim(c(0,1), ll.no.random, X=X, y=y)

optim(c(0,1), ll.no.random, X=X, y=y)$par
# -0.2992456  2.1427484

which of course gives the same estimates and maximizes the log-likelihood for the same value. For mixed effects, we would want something like

library(lme4)
glmer(y~x + (1|z), family="binomial")

But how can we do the same with our own function? Since the likelihood is

$$ L = \prod_{j=1}^J \int \text{Pr}(y_{1j},…,y_{n_jj}|\boldsymbol{x}, b_j) f(b_j)db_j $$

and the integral has no closed-form expression, we need to use numerical integration like Gaussian Quadrature. We can use the package statmod to get some quadratures, say 10

library(statmod)    
gq <- gauss.quad(10)
w  <- gq$weights
g  <- gq$nodes

UPDATE: Using these quadrature locations $g_r$ and weights $w_r$ for the $r=1,…,R$ ($R=10$ here), we can approximate the integral over $b_j$ by a sum of the $R$ terms with $g_r$ substituted for $b_j$ and the whole term multiplied by the respective weights $w_r$. Thus, our likelihood function should be now

$$ L = \prod_{j=1}^J \sum_{r=1}^{R} Pr (y_{1j},…,y_{n_jj} | \boldsymbol{x}, b_j=g_r ) w_r $$

Also, we need to account for the variance of the random part, I read that this can be achieved by replacing the $b_j \sim N(0,\sigma_b^2)$ in our $\eta$ function with $\sigma_j \theta_j$ where $\theta_j \sim N(0,1)$, so in the likelihood function above we actually replace $\theta$'s with $g$'s and not $\beta$'s.

One computational issue I don't get is how to substitute the terms since the vectors won't be of the same length. But probably I don't understand that, because I'm missing something crucial here, or misunderstood how this method works.

Best Answer

I did not see how "the vectors won't be of the same length", please clarify your question.

First of all, for the integral with dimension less than 4, the direct numerical methods like quadrature are more efficient than MCMC. I studied these questions for a while, and I would be happy to discuss this problem with you.

For mixed-effects logistic regression, the only explicit R code I have found is from Prof. Demidenko's book, Mixed Models: Theory and Applications, you can download the code via the column of "SOFTWARE AND DATA" on the webpage. The logMLEgh() can be found in \mixed_models_data.zip\MixedModels\Chapter07. He did not use the statmod package to obtain the quadratures, but wrote his own function gauher(). There are some minor errors in the code and I have discussed them with the author, but it is still very helpful to start from his code and book. I can provide the corrected version if needed.

Another issue is that, if you want to get accurate estimates, optim() is not enough, you may need to use methods like Fisher scoring, as in glm().

Related Solutions

Solved – R glmer.nb output. How to get $\hat{\theta}$

Looking at the source code, this isn't something that is trivially exposed. Essentially, glmer.nb is (from github sources):

glmer.nb <- function(..., interval = log(th)+c(-3,3), verbose=FALSE)
{
  g0 <- glmer(..., family=poisson)
  th <- est_theta(g0)
  if(verbose) cat("th := est_theta(glmer(..)) =", format(th),"\n")
  g1 <- update(g0, family = negative.binomial(theta=th))
  ## if (is.null(interval)) interval <- log(th)+c(-3,3)
  optTheta(g1, interval=interval, verbose=verbose)
}

Which essentially is:

fit a standard Poisson GLMM
estimate $\theta$ using maximum likelihood estimation given this standard Poisson fit. This uses MASS::theta.ml()
refit the model using this value of $\theta$ and the negative.binomial() family function which needs a fixed $\theta$,
~~last line does an optimization to find $\theta$ given the updated fit, and that function throws an error if the original estimate from 2. and the optimised one from 4. don't match.~~

the last line performs 1-d optimisation on the interval log(theta) + c(-3, 3) via optimize() to find the value of $\theta$ that minimises the deviance of the negative binomial GLMM.

That said, you might try:

lme4:::getNBdisp(fit)

where fit is the model returned by glmer.nb. This is just a guess based on reading the code but it is used by the authors in tests etc.

Solved – Generalized Linear Model for Weibull distribution

If the hurdle is $E(y^\lambda)$, a simple substitution serves:

$f_Y(y)=\frac{\lambda y^{\lambda -1}}{\theta^{\lambda}}\exp(-(\frac{y}{\theta})^{\lambda})$

$E(y^\lambda)=\int_0^\infty y^\lambda \frac{\lambda y^{\lambda -1}}{\theta^{\lambda}}\exp(-(\frac{y}{\theta})^{\lambda}) dy$

let $u=(\frac{y}{\theta})^{\lambda}$, $du=\frac{\lambda}{\theta}(\frac{y}{\theta})^{\lambda-1}dy$

$E(y^\lambda)=\theta^\lambda \int_0^\infty u\, \exp(-u)\,du =\theta^\lambda $

Alternatively, you could look at the Wikipedia page for the Weibull distribution, which gives (in your parameterization) $E(y^r)=\theta^r\,\Gamma(1+\frac{r}{\lambda})$, and substitute $r=\lambda$.

A little simple manipulation of information from your question will let you write $\theta^\lambda$ in terms of $X\beta$.

Best Answer

Related Solutions

Solved – R glmer.nb output. How to get $\hat{\theta}$

Solved – Generalized Linear Model for Weibull distribution

Related Question