Multilevel Analysis – Conducting a Multilevel Logistic Regression: Simulation Study with R

logisticmultilevel-analysisrself-studysimulation

I have written R codes for simulating data from Multilevel logistic regression model .

I focus on the following multilevel logistic model with
one explanatory variable at level 1 (individual level) and
one explanatory variable at level 2 (group level) :

$$\text{logit}(p_{ij})=\pi_{0j}+\pi_{1j}x_{ij}\ldots (1)$$
$$\pi_{0j}=\gamma_{00}+\gamma_{01}z_j+u_{0j}\ldots (2)$$
$$\pi_{1j}=\gamma_{10}+\gamma_{11}z_j+u_{1j}\ldots (3)$$

where , $u_{0j}\sim N(0,\sigma_0^2)$ , $u_{1j}\sim N(0,\sigma_1^2)$ , $\text{cov}(u_{0j},u_{1j})=\sigma_{01}$

In this paper in equation (2) , they assumed $\text{cov}(u_{0j},u_{1j})=\sigma_{01}$ , that is not independent . But also they mentioned in the methodology section that :

The group random components $u_{0j}$ and $u_{1j}$
are "independent" normal variables with mean zero and
standard deviations $σ_0$ and $σ_1$.

So I assumed $\text{cov}(u_{0j},u_{1j})=0$ .

R code :

## Simulating data from multilevel logistic regression 

set.seed(1234)
x <- rnorm(1000) ### individual level variable
z <- rnorm(1000) ### group level variable

##fixed effect parameter
g_00 <- -1
g_01 <- 0.3
g_10 <- 0.3
g_11 <- 0.3

g <- matrix(c(g_00,g_01,g_10,g_11),ncol=1)

require(mvtnorm)

##need variance values as input 
s2_0 <- 0.36
s2_1 <- 1
s01 <- 0

##generate bi-variate normal rv for u0, u1

avg <- c(0,0) ##mean
sigma <- matrix(c(s2_0,s01,s01,s2_1),ncol=2)

u <- rmvnorm(1000,mean=avg,sigma=sigma,method="chol")

pi_0j <- g_00 +g_01*z + as.vector(u[,1])
pi_1j <- g_10 +g_11*z + as.vector(u[,2])
p <- exp(pi_0j+pi_1j*x)/(1+exp(pi_0j+pi_1j*x))

y <- rbinom(1000,1,p)

But i am not understanding where is to consider the group ? If i select number of groups to be $100$ $(j=1,2,\ldots, 100)$, then will I assign the groups randomly against each $y_{i,j}$ ?

  • Have i correctly simulated data from Multilevel Logistic Distribution ?

Best Answer

Have I correctly simulated data from Multilevel Logistic Distribution?

Well, maybe, but I don't think so. Think of it this way: You want to simulate the case where you have a sample of individuals, and each individual belongs to a group. The individuals have covariate values and the groups have covariate values.

The article is a little confusing at first. If you check out Maas and Joop (2004), you will see that Moineddin et al (2007) are following a very similar line of development. That might clarify some of their decisions.

Maas and Joop (2004) examine the situation in the multilevel regression situation. The model is quite similar, with the same exact structure for the $u_{0j}$ and $u_{1j}$. And there on page 131 we find the following statement:

To simplify the simulation model, without loss of generality, the covariance between the two u-terms is assumed equal to zero.

So, Moineddin et al (2007) are just following along. There is no reason you have to assume that $\sigma_{01} = 0$. Of course, to reproduce the results in the paper, you would.

They also are following the same line of development with respect to evaluating the effect of the intraclass correlation coefficient (ICC) on estimation. In the logistic model, the ICC is computed a little differently than in the linear regression situation. But, the bottom line is the same --- specifying an ICC equates to specifying a variance for $\sigma_0^2$.

Here is a revision of your code:

## Simulating data from multilevel logistic regression

require(mvtnorm)

set.seed(1234)

# Set up various parameters.

J   <- 30          ### number of groups
n_j <- rep(5, J)   ### number of individuals in jth group

# What is the sample size in this case?

N <- sum(n_j)

# Set the fixed effect parameters.

g_00 <- -1
g_01 <- 0.3
g_10 <- 0.3
g_11 <- 0.3

# Set the variance values for the group level random effect.

s2_0 <- 0.13    ### variance corresponding to specific ICC
s2_1 <- 1       ### variance standardized to 1
s01  <- 0       ### covariance assumed 0

# Simulate the covariate values for this sample size.

z <- rnorm(J)   ### group-level covariate values
x <- rnorm(N)   ### individual-level covariate values

At this point, you are all set to simulate. Notice that your original variance value was actually the standard deviation ($0.36^2 \approx 0.13$). That variance value was driven by the desire to have the ICC $=0.04$.

# Generate (u_0j, u_1j) from a bivariate normal.

mu    <- c(0, 0)
sigma <- matrix(c(s2_0, s01, s01, s2_1), ncol=2)

u <- rmvnorm(J, mean=mu, sigma=sigma, method="chol")

# Now form the linear predictor.  First, the group-level
# predictor.  There are J of these.

pi_0 <- g_00 + g_01 * z + as.vector(u[, 1])
pi_1 <- g_10 + g_11 * z + as.vector(u[, 2])

# Next, the individual level predictor.  There are
# N of these.

eta <- rep(pi_0, n_j) + rep(pi_1, n_j) * x

# Transform back to the probability scale.

p <- exp(eta)/( 1 + exp(eta) )

# Simulate a Bernoulli from each individual distribution.

y <- rbinom(N, 1, p)

Maas, Cora JM, and Joop J. Hox (2004) Robustness issues in multilevel regression analysis. Statistica Neerlandica 58.2: 127-137.

Moineddin, Rahim, Flora I. Matheson, and Richard H. Glazier (2007) A simulation study of sample size for multilevel logistic regression models. BMC Medical Research Methodology 7.1: 34.

Related Question