Solved – Multilevel logistic regression with a random slope(s)

interpretationlogisticmultilevel-analysisregression

I would like to specify a two-level logistic regression model with random intercept and random slope.

Dependent variable: hospitalization (1) or no-hospitalization (0).

Independent variables: age, number of drugs used, comorbidity, others…

Multilevel structure: patients clustered within hospitals. Hospitalization rate varies across different hospitals.

The goal is to identify variables in the model that explain the largest part of variation in hospitalization rate.

I have two questions:

Can I run several models each time allowing for random intercept + random slope including all other variables as fixed. I would like to have a look at the size of variance that is explained by each patients' characteristic, but when i add several random slopes model does not converge.
When I specify the random slope then the intercept variance is inflated. How can I interpret that?

Best Answer

Do you mean that you are turning age, number of drugs, etc. into random effects? When you do that, you are assuming that the impact of age, drugs, comorbities, and so on, differs from hospital to hospital. And in some way, this impact is distinct from the basic "hospital" effect that you have already included. It doesn't surprise me that the model does not converge, since you are going to have a lot of parameters here, not all of which may be needed by the model. Useless parameters do not change the value of the likelihood much, so the optimizer could end up roaming around in a trough somewhere, not sure where to go next.

And furthermore if a particular effect is not random --- or the true variance is extremely small --- then the optimizer will be directed towards the boundary of the parameter space (since the variance can't be negative) - this would cause the optimizer to stop where the gradient was non-zero and the optimizer would throw a non-convergence error.

This is not strange. If slopes are needed in the model, but you omit them, the intercept parameter will have to cover the differences between hospitals by itself. This will force the intercept parameter to cover a wider range of values than it would otherwise need to, and hence it will have a larger variance.

enter image description here

Pretend that your data look like the figure. The true model is shown by the green lines: common intercept, but different slopes. However, if you fit a model with different intercepts and 0 slope, you will get something like the red lines, as the intercept struggles to carry the full burden of the variation in the data all by itself. That's why the variance increases when the slopes are omitted.

And by the way ... welcome to the site and good luck with your analysis.

Related Solutions

Solved – Multilevel models including random slopes: how to calculate variance

Let me start with simpler models (detailed answers below), and let me use a slightly different notation (NB: random effecs at different levels assumed to be uncorrelated).

Model 1 (Two levels, random intercept, fixed slope). There are $N$ observations of $J$ students: $$\begin{align}y_{ij}&=\pi_{0j}+\beta_1 X_{ij}+\varepsilon_{ij},\quad&\varepsilon_{ij}\sim\mathcal{N}(0,\sigma^2_\varepsilon)\\ \pi_{0j}&=\beta_0+b_{0j},\quad& b_{0j}\sim\mathcal{N}(0,\sigma^2_0)\end{align}$$. If we substitute for $\pi_0$ and $\pi_1$ we get: $$y_{ij}=\beta_0+\beta_1X_{ij}+(b_{0j}+\varepsilon_{ij})$$ so: $$\begin{align}\text{Var}(y_{ij})&=\text{Var}(b_{0j}+\varepsilon_i)=\sigma^2_0+\sigma^2_\varepsilon \\ \text{Cov}(y_{ij},y_{i'j})&=\sigma^2_0 \quad& i\ne i' \\ \text{Cov}(y_{ij},y_{i'j'})&=0 \quad& i\ne i',\;j\ne j' \end{align}$$ The intraclass correlation coefficient: $$\rho=\frac{\sigma^2_0}{\sigma^2_0+\sigma^2_\varepsilon}$$ measures the proportion of the variance in the outcome that is between students. As to the overall covariance matrix, it is (Goldstein, Multilevel Statistical Models, pg. 20): $$ \begin{bmatrix} \sigma^2_0\mathbf{J}_{N_1}+\sigma^2_\varepsilon\mathbf{I}_{N_1} & 0 & \dots & 0 \\ \vdots & \vdots & \vdots & \vdots \\ \dots & \dots & \sigma^2_0\mathbf{J}_{N_j}+\sigma^2_\varepsilon\mathbf{I}_{N_j} & 0 \\ \vdots & \vdots & \vdots & \vdots \\ 0 & 0 & 0 & \sigma^2_0\mathbf{J}_{N_J}+\sigma^2_\varepsilon\mathbf{I}_{N_J} \end{bmatrix}$$ where $N_j$ is the number of observations for student $j$, $\mathbf{J}_{N_j}$ is the $(N_j\times N_j)$ matrix of ones and $\mathbf{I}_{N_j}$ is the $(N_j\times N_j)$ identity matrix. For example, if there are $J=2$ students and $N_1=3,N_1=2$ observations: $$\begin{bmatrix} \sigma^2_0+\sigma^2_\varepsilon & \sigma^2_0 & \sigma^2_0 & 0 & 0 \\ \sigma^2_0 & \sigma^2_0+\sigma^2_\varepsilon & \sigma^2_0 & 0 & 0 \\ \sigma^2_0 & \sigma^2_0 & \sigma^2_0+\sigma^2_\varepsilon & 0 & 0 \\ 0 & 0 & 0 & \sigma^2_0+\sigma^2_\varepsilon & \sigma^2_0 \\ 0 & 0 & 0 & \sigma^2_0 & \sigma^2_0+\sigma^2_\varepsilon \end{bmatrix}=(\sigma^2_0+\sigma^2_\varepsilon) \begin{bmatrix} 1 & \rho & \rho & 0 & 0 \\ \rho & 1 & \rho & 0 & 0 \\ \rho & \rho & 1 & 0 & 0 \\ 0 & 0 & 0 & 1 & \rho \\ 0 & 0 & 0 & \rho & 1 \end{bmatrix}$$

Model 2 (Three levels, random intercept, fixed slope). There are $N$ observations, $J$ students, and $K$ schools: $$\begin{align} y_{ijk}&=\pi_{0jk}+\beta_1 X_{ijk}+\varepsilon_{ijk},\quad&\varepsilon_{iij}\sim\mathcal{N}(0,\sigma^2_\varepsilon) \\ \pi_{0jk} &= \gamma_{0k}+b_{0jk},\quad& b_{0jk}\sim\mathcal{N}(0,\sigma^2_0) \\ \gamma_{0k} &= \beta_0 + b_{0k},\quad & b_{0k}\sim\mathcal{N}(0,\tau^2_0) \end{align}$$ The composite model is: $$y_{ijk}=\beta_0+\beta_1 X_{ijk}+ (b_{0k}+b_{0jk}+\varepsilon_{ijk})$$ and: $$\text{Var}(y_{ijk})=\tau^2_0+\sigma^2_0+\sigma^2_\varepsilon$$ Instead of one intraclass correlation coefficient, several correlation coefficients may be defined:

$(\tau^2_0+\sigma^2_0)/(\tau^2_0+\sigma^2_0+\sigma^2_\varepsilon)$: the proportion of variance among students;
$\tau^2_0/(\tau^2_0+\sigma^2_0+\sigma^2_\varepsilon)$: the proportion of variance among schools;
$\sigma^2_0/(\tau^2_0+\sigma^2_0+\sigma^2_\varepsilon)$: the proportion of variance among students within schools;
$\sigma^2_\varepsilon/(\tau^2_0+\sigma^2_0+\sigma^2_\varepsilon)$: the proportion of variance within students.

The covariance matrix is complex. If there are $K=3$ schools, a covariance matrix looks like (Rodriguez): $$(\tau^2_0+\sigma^2_0+\sigma^2_\varepsilon)\begin{bmatrix} 1 & \rho_1 & \rho_2 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \rho_1 & 1 & \rho_2 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ \rho_2 & \rho_2 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & \rho_1 & \rho_2 & \rho_2 & 0 & 0 & 0\\ 0 & 0 & 0 & \rho_1 & 1 & \rho_2 & \rho_2 & 0 & 0 & 0 \\ 0 & 0 & 0 & \rho_2 & \rho_2 & 1 & \rho_1 & 0 & 0 & 0 \\ 0 & 0 & 0 & \rho_2 & \rho_2 & \rho_1 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & \rho_1 & \rho_2 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & \rho_1 & 1 & \rho_2 \\ 0 & 0 & 0 & 0 & 0 & 0 & 0 & \rho_2 & \rho_2 & 1 \end{bmatrix}$$ where $\rho_1=(\tau^2_0+\sigma^2_0)/(\tau^2_0+\sigma^2_0+\sigma^2_\varepsilon)$ and $\rho_2=\tau^2_0/(\tau^2_0+\sigma^2_0+\sigma^2_\varepsilon)$.

Model 3 (Two levels, random intercept and slope). We let $\pi_1$ vary across the students in Model 1: $$\begin{align}y_{ij}&=\pi_{0j}+\pi_1 X_{ij}+\varepsilon_{ij},\quad&\varepsilon_i\sim\mathcal{N}(0,\sigma^2_\varepsilon)\\ \pi_{0j}&=\beta_0+b_{0j},\quad& b_{0j}\sim\mathcal{N}(0,\sigma^2_0)\\ \pi_1 &= \beta_1+b_{1j},\quad& b_{1j}\sim\mathcal{N}(0,\sigma^2_1) \end{align}$$ but now we must represent the dispersion of the level-2 random effects as a covariance matrix: $$\text{Var}(\mathbf{b}_j)=\text{Var}\begin{bmatrix} b_{0j} \\ b_{1j} \end{bmatrix} =\begin{bmatrix} \sigma^2_0 & \sigma_{01} \\ \sigma_{01} & \sigma^2_1 \end{bmatrix}$$ The combined model is: $$y_{ij}=\beta_0+\beta_1X_{ij}+(b_{0j}+b_{1j}X_{ij}+\varepsilon_{ij})$$ so (Goldstein, Multilevel Statistical Models, pg. 22): $$\begin{align}\text{Var}(y_{ij})&=\begin{bmatrix} \mathbf{1} & \mathbf{X} \end{bmatrix}\begin{bmatrix} \sigma^2_0 & \sigma_{01} \\ \sigma_{01} & \sigma^2_1 \end{bmatrix}\begin{bmatrix} \mathbf{1} & \mathbf{X} \end{bmatrix}'+\sigma^2_\varepsilon \\ \text{Cov}(y_{ij},y_{i'j})&=\sigma^2_0+2\sigma_{01}(X_{ij}+X_{i'j})+\sigma^2_1 X_{ij} X_{i'j} \\ \text{Cov}(y_{ij},y_{i'j'})&=0 \end{align}$$

The overall covariance matrix is still block diagonal but the blocks are: $\left(\begin{bmatrix} \mathbf{1} & \mathbf{X} \end{bmatrix}\text{Var}(\mathbf{b}_j)\begin{bmatrix} \mathbf{1} & \mathbf{X} \end{bmatrix}'\right)\mathbf{J}_{N_j}+\sigma^2_\varepsilon\mathbf{I}_{N_j}$.

We could define: $$\rho=\frac{\sigma^2_0+2\sigma_{01}X_{ij}+\sigma^2_1 X_{ij}^2}{\sigma^2_0+2\sigma_{01}X_{ij}+\sigma^2_1 X_{ij}^2+\sigma^2_\varepsilon}$$ but it is not a correlation coefficient like the previous ones. The correlation between two observations is given by: $$\frac{\sigma^2_0+2\sigma_{01}(X_{ij}+X_{i'j})+\sigma^2_1 X_{ij} X_{i'j}} {\sqrt{(\sigma^2_0+2\sigma_{01}X_{ij}+\sigma^2_1 X_{ij}^2+\sigma^2_\varepsilon)(\sigma^2_0+2\sigma_{01}X_{i'j}+\sigma^2_1 X_{i'j}^2+\sigma^2_\varepsilon)}}$$ Its value is a function of $X$, so there is not a unique correlation coefficient, but several coefficients that can be calculated for any value of the predictor variable(s) (Goldstein et al., Partitioning variation in multilevel models).

Model 4 (Three levels, random intercept and slope). We let $\pi_1$ vary across the students in Model 2: $$\begin{align} y_{ijk}&=\pi_{0jk}+\pi_{1jk} X_{ijk}+\varepsilon_{ijk},\quad&\varepsilon_{iij}\sim\mathcal{N}(0,\sigma^2_\varepsilon) \\ \pi_{0jk} &= \gamma_{0k}+b_{0jk},\quad& b_{0jk}\sim\mathcal{N}(0,\sigma^2_0) \\ \pi_{1jk} &= \gamma_{1k}+b_{1jk},\quad& b_{1jk}\sim\mathcal{N}(0,\sigma^2_1) \\ \gamma_{0k} &= \beta_0 + b_{0k},\quad & b_{0k}\sim\mathcal{N}(0,\tau^2_0) \\ \gamma_{1k} &= \beta_1 + b_{1k},\quad & b_{1k}\sim\mathcal{N}(0,\tau^2_1) \end{align}$$

The covariance matrices for level-2 and level-3 are:

$$\text{Var}(\mathbf{b}_j)=\begin{bmatrix} \sigma^2_0 & \sigma_{01} \\ \sigma_{01} & \sigma^2_1 \end{bmatrix}\qquad\qquad \text{Var}(\mathbf{b}_k)=\begin{bmatrix} \tau^2_0 & \tau_{01} \\ \tau_{01} & \tau^2_1 \end{bmatrix}$$

The composite model is:

$$y_{ijk}=\beta_0 + \beta_1 X_{ijk} + (b_{0jk}+b_{1jk} X_{ijk})+(b_{0k}+b_{1k} X_{ijk}) + \varepsilon_{ijk}$$

and the variance at level-1 is:

$$\text{Var}(y_{ijk})=\text{Var}(b_{0jk}+b_{1jk} X_{ijk})+\text{Var}(b_{0k}+b_{1k} X_{ijk})+\text{Var}(\varepsilon_{ijk})$$

where $\text{Var}(b_{0jk}+b_{1jk} X_{ijk})$ is the variance at level-2 and $\text{Var}(b_{0k}+b_{1k} X_{ijk})$ is the variance at level-3. They depend on the value of $X$ as in Model 3.

Now:

Multilevel models including random slopes: how to calculate variance

I hope it is somewhat clear now.

$Y_{ijk}=\beta_0+\beta_1*Time+b_k+b_{jk}+\epsilon_{ijk}$

I'write: $Y_{ijk}=\beta_0+\beta_1\text{Time}+b_{0k}+b_{0jk}$ (Model 2) to highlight that $b_{0k}$ and $b_{0jk}$ randomize $\beta_0$.

I add a random slope per school:
$Y_{ijk}=\beta_0+\beta_1* Time+b_{0k}+b_{1k}*Time+b_{jk}+\epsilon_{ijk}$
Is in this model the variation of $b_{0k}$ still meaningful?

I'd write: $Y_{ijk}=\beta_0+\beta_1\text{Time}+b_{0k}+b_{1k}\text{Time}+b_{0jk}+\epsilon_{ijk}$, i.e.: $$Y_{ijk}=(\beta_0+b_{0jk}+b_{0k})+(\beta_1+b_{1k})\text{Time}+\epsilon_{ijk}$$ where $(\beta_0+b_{0jk}+b_{0k})$ is the random intercept per student within a school and per school, and $(\beta_1+b_{1k})$ is the random slope per school (Model 4 with $b_{1jk}=E(b_{1jk})=0$, $\sigma^2_1=0$.)

As to $b_{0k}$, the random intercept per school, it is even more meaningful than $b_{1k}$! "Given that among multilevel modelers random slopes tend to be more popular than heteroscedasticity, unrecognized heteroscedasticity may show up in the form of a fitted model with a random slope [...] which then may disappear if the heteroscedasticity is modeled" (Snijders and Berkhof, Diagnostic Checks for Multilevel Models, §3.2.1).

Is my intuition correct that the variation between different schools increases over time, as the random slopes drive the schools further apart?

In all the above models I have assumed that all random effects are uncorrelated and that their variances are constant. Is these assumptions hold, then covariances increase quadratically with time (see conjectures' answer). However, if the $\tau$ variances and covariances decrease with time you could observe a somewhat different pattern.

BTW, your plot does show a different pattern and suggests a bit of heteroscedasticity more than a random slope.

the covariance between students within the same school is given by:
$\text{Cov}_{school}=\frac{\sigma^2_3}{\sigma^2_1+\sigma^2_2+\sigma^2_3}$ What is this covariance if a random intercept is included?

That is not a covariance, it is the $\rho_2$ correlation coefficient in Model 2. The covariance between two schools depends on Time.

Solved – Multilevel: Can I include two dumthe variables of a 7-dumthe-set into a random slope

Yes, you can. A dummy variable is no different, mathematically, from any other fixed effect you might choose to include. Presumably, if some of the categories all have the same impact on the response, it would make sense to zero them out and push their effect into the intercept term.

That said, using Wald statistics to cull variables is risky. You may get the right set of variables, but this isn't necessarily the case.

When you say "random slope", are you talking about the coefficient of the categorical variable? If so, I would do some model checking. Look at the estimated random effects and see if they are trying to cover small but real differences in your categories.

To clarify that last point: suppose I have 4 categories: A,B,C and D. I decide to omit the dummies associated with C and D. The intercept in the model now corresponds to the case where categories C or D occur. It's like I'm recoding to A, B and Other. But let's suppose that C and D really are real, but just fairly small.

Now fit the random effects model. You will get random intercepts for individuals coded "Other" ... but if you plot these against the true categories (C and D), you might find that the C effects are large and the D effects are small (say), or vice versa. When you add random effects, you are giving a bunch of extra parameters to your model, which it could use to cover up for defects of the model itself.

Best Answer

Related Solutions

Solved – Multilevel models including random slopes: how to calculate variance

Solved – Multilevel: Can I include two dumthe variables of a 7-dumthe-set into a random slope

Related Question