Generalized Linear Model – Estimating Multilevel Logistic Regression Models

generalized linear modellogisticmultilevel-analysisrsimulation

The following multilevel logistic model with
one explanatory variable at level 1 (individual level) and
one explanatory variable at level 2 (group level) :

$$\text{logit}(p_{ij})=\pi_{0j}+\pi_{1j}x_{ij}\ldots (1)$$
$$\pi_{0j}=\gamma_{00}+\gamma_{01}z_j+u_{0j}\ldots (2)$$
$$\pi_{1j}=\gamma_{10}+\gamma_{11}z_j+u_{1j}\ldots (3)$$

where , the group-level residuals $u_{0j}$ and $u_{1j}$ are assumed to have a multivariate normal distribution with expectation zero . The variance of the residual errors $u_{0j}$ is specified as $\sigma^2_0$ , and the variance of the residual errors $u_{1j}$ is specified as $\sigma^2_1$ .

I want to estimate the parameter of the model and I like to use R command glmmPQL .

Substituting equation (2) and (3) in equation (1) yields ,

$$\text{logit}(p_{ij})=\gamma_{00}+\gamma_{10}x_{ij}+\gamma_{01}z_j+\gamma_{11}x_{ij}z_j+u_{0j}+u_{1j}x_{ij}\ldots (4)$$

There are 30 groups$(j=1,…,30)$ and 5 individual in each group .

R code :

   #Simulating data from multilevel logistic distribution 
   library(mvtnorm)
   set.seed(1234)

   J <- 30             ## number of groups
   n_j <- rep(5,J)     ## number of individuals in jth group
   N <- sum(n_j)

   g_00 <- -1
   g_01 <- 0.3
   g_10 <- 0.3
   g_11 <- 0.3

   s2_0 <- 0.13  ##variance corresponding to specific ICC
   s2_1 <- 1     ##variance standardized to 1
   s01  <- 0     ##covariance assumed zero

   z <- rnorm(J)
   x <- rnorm(N)

   #Generate (u_0j,u_1j) from a bivariate normal .
   mu <- c(0,0)
  sig <- matrix(c(s2_0,s01,s01,s2_1),ncol=2)
  u <- rmvnorm(J,mean=mu,sigma=sig,method="chol")

  pi_0 <- g_00 +g_01*z + as.vector(u[,1])
  pi_1 <- g_10 + g_11*z + as.vector(u[,2])
  eta <- rep(pi_0,n_j)+rep(pi_1,n_j)*x
  p <- exp(eta)/(1+exp(eta))

  y <- rbinom(N,1,p)

Now the parameter estimation .

  #### estimating parameters 
  library(MASS)
  library(nlme)

  sim_data_mat <- matrix(c(y,x,rep(z,n_j),rep(1:30,n_j)),ncol=4)
  sim_data <- data.frame(sim_data_mat)
  colnames(sim_data) <- c("Y","X","Z","cluster")
  summary(glmmPQL(Y~X*Z,random=~1|cluster,family=binomial,data=sim_data,,niter=200))

OUTPUT :

      iteration 1
      Linear mixed-effects model fit by maximum likelihood
      Data: sim_data 

      Random effects:
      Formula: ~1 | cluster
              (Intercept)  Residual
      StdDev: 0.0001541031 0.9982503

      Variance function:
      Structure: fixed weights
      Formula: ~invwt 
      Fixed effects: Y ~ X * Z 
                      Value Std.Error  DF   t-value p-value
      (Intercept) -0.8968692 0.2018882 118 -4.442404  0.0000
      X            0.5803201 0.2216070 118  2.618691  0.0100
      Z            0.2535626 0.2258860  28  1.122525  0.2712
      X:Z          0.3375088 0.2691334 118  1.254057  0.2123
      Correlation: 
           (Intr) X      Z     
      X   -0.072              
      Z    0.315  0.157       
      X:Z  0.095  0.489  0.269

      Number of Observations: 150
      Number of Groups: 30 
  • Why does it take only $1$ iteration while I mentioned to take $200$ iterations inside the function glmmPQL by the argument niter=200 ?

  • Also p-value of group-level variable $(Z)$ and cross-level interaction $(X:Z)$ shows they are not significant . Still why in this article, they keep the group-level variable $(Z)$ and cross-level interaction $(X:Z)$ for further analysis ?

  • Also How are the degrees of freedom DF being calculated ?

  • It doesn't match with the relative bias of the various estimates of the table . I tried to calculate the relative bias as :

     #Estimated Fixed Effect parameters :
    
     hat_g_00 <- -0.8968692 #overall intercept
     hat_g_10 <- 0.5803201  # X
     hat_g_01 <-0.2535626   # Z
     hat_g_11 <-0.3375088   #X*Z
    
    fixed <-c(g_00,g_10,g_01,g_11)
    hat_fixed <-c(hat_g_00,hat_g_10,hat_g_01,hat_g_11)
    
    
    #Estimated Random Effect parameters :
    
    hat_s_0 <-0.0001541031  ##Estimated Standard deviation of random intercept 
    hat_s_1 <-  0.9982503 
    
    std  <- c(sqrt(0.13),1) 
    hat_std  <- c(0.0001541031,0.9982503) 
    
    ##Relative bias of Fixed Effect :
    rel_bias_fixed <- ((hat_fixed-fixed)/fixed)*100
    [1] -10.31308  93.44003 -15.47913  12.50293
    
    ##Relative bias of Random Effect :
    rel_bias_Random <- ((hat_std-std)/std)*100
    [1] -99.95726  -0.17497
    
  • Why doesn't the relative bias match with the table ?

Best Answer

There are perhaps too many questions here. Some comments:

  • you might consider using glmer from the lme4 package (glmer(Y~X*Z+(1|cluster),family=binomial,data=sim_data)); it uses Laplace approximation or Gauss-Hermite quadrature, which are generally more accurate than PQL (although the answers are very similar in this case).
  • The niter argument specifies the maximum number of iterations; only one iteration was actually necessary
  • I'm not sure what your question is about the interaction term. Whether you should drop non-significant interaction terms or not is a bit of a can of worms, and depends both on your statistical philosophy and on the goals of your analysis (e.g. see this question)
  • the denominator degrees of freedom are being calculated according to a simple 'inner-outer' heuristic a simple 'inner-outer' rule described on page 91 of Pinheiro and Bates (2000), which is available on Google Books ... it is generally a reasonable approximation but the computation of degrees of freedom is complex, especially for GLMMs
  • if you're trying to replicate "A simulation study of sample size for multilevel logistic regression models" by Moineddin et al. (DOI: 10.1186/1471-2288-7-34), you need to run a large number of simulations and compute averages, not just compare a single run. Furthermore, you should probably try to get closer to their methods (coming back to my first point, they state that they use SAS PROC NLMIXED with adaptive Gauss-Hermite quadrature, so you'd be better off with e.g. glmer(...,nAGQ=10); it still won't match exactly, but it'll probably be closer than glmmPQL.