Solved – Interpreting the mathematical formula of a mixed effect model

mixed modelmultilevel-analysis

I am a bit confused about the function of a parameter in setting up a linear mixed effect model (hierarchical/multilevel model). This is how I understand a (random intercept and slope) multilevel model is set up:

Level 1:
$$
y_{ij}=\beta_{0j}+\beta_{1j}x_{ij}+e_{ij}
$$
(The $i$-th observation in group $j$ is estimated by an intercept for group $j$, a slope for group $j$ of the (here first and only) predictor and an error term for group $j$.)

Level 2:

intercept per group: $\beta_{0j}=\beta_0+u_{0j}$

(The intercept for group $j$ is estimated by a global intercept and a random variable for group $j$.)

slope per group: $\beta_{1j}=\beta_1+u_{1j}$

(The slope for group $j$ is estimated by a global intercept for the first predictor and a random variable for the $j$-th group of the first predictor.)

But I have encountered another way to write the intercept per group:

intercept for group: $\beta_{0j}=\beta_0+\beta_{0ij}x_{0ij}+u_{0j}$

My interpretation of this would be: The intercept for group $j$ is estimated by a global intercept and a slope for the $i$-th observation in group $j$ multiplied with a predictor $x$ for the intercept and a random variable for group $j$. My first question is if I correctly interpret the subscripts of the slope and predictor of the group specific intercept. Here is what I would guess:

$\beta_{0ij}$: $0:=$ just there for consistency; $i:=$ observation; $j:=$ group.

(1) Why is the variable $x_{0ij}$ and the parameter/coefficient $\beta_{0ij}$ included?

(2) Do they actually function as a fixed predictor in estimating the slope?

(3) Are really three subscripts needed?

(4) I read the claim that the $x_{0ij}$ and $\beta_{0ij}$ are included in order "make the model symmetric". What does this mean?

Best Answer

Are you sure that you have encountered $\beta_{0j}=\beta_0+\beta_{0ij}x_{0ij}+u_{0j}$? Exactly? May be, but the $i$'s can't denote indviduals, because level-2 predictors are individual-invariant.

In general, in a random coefficients model you have: $$\begin{align} \text{Level-1:}&\quad &y_{ij}&=\beta_{0j}+\beta_{1j}x_{ij}+e_{ij}\\ \text{Level-2:}&\quad&\beta_{0j}&=\beta_0+\cdots+u_{0j}\\ &&\beta_{1j}&=\beta_1+\cdots+u_{1j}\end{align}$$ where "..." may be nothing or a set of level-2 predictors. If you add two level-2 predictors to both the intercept and the slope, you could write: $$\begin{align} \beta_{0j}&=\beta_0+\beta_{01}x_{01}+\beta_{02}x_{02}+u_{0j}\\ \beta_{1j}&=\beta_1+\beta_{11}x_{11}+\beta_{12}x_{12}+u_{1j}\end{align}$$ but that would be a mess, because $\beta_{ij}=\beta_{11}$ when $j=1$, etc. Thus you can:

  • add a third subscript (confusing, see below)
  • use different symbols for different levels: $$\begin{align} \beta_{0j}&=\gamma_{00}+\gamma_{01}z_{01}+\gamma_{02}z_{02}+u_{0j}\\ \beta_{1j}&=\gamma_{10}+\gamma_{11}z_{11}+\gamma_{12}z_{12}+u_{1j}\end{align}$$

The second notation is better, because you can see that:

  1. terms such as $\gamma_{kp}z_{kp}$ (the $p$th term for the $k$th level-1 coefficient, not for the $i$th individual!) are included to explain the variability of the intercept/slope (just as in plain-vanilla regression they explain the variability of the dependent variable)
  2. level-2 coefficients are 'fixed', meaning that they are population parameters: they are neither individual-variant nor group-variant (this is why a $j$ subscript would be misleading).

In general, there are as many subscripts as levels, i.e. you use three subscripts in three-level models (say, students, classrooms, schools): $$\begin{align} \text{Level-1:}&\quad &y_{ijk}&=\pi_{0jk}+\pi_{1jk}x_{ijk}+e_{ijk}\\ \text{Level-2:}&\quad&\pi_{0jk}&=\beta_{00k}+\beta_{01k}z_{01k}+\beta_{02k}x_{02k}+r_{0jk}\\ &&\pi_{1jk}&=\beta_{10k}+\beta_{11k}z_{11k}+\beta_{12k}x_{12k}+r_{1jk}\\ \text{Level-3:}&\quad &\beta_{00k}&=\gamma_{000}+\gamma_{001}w_{001}+\gamma_{002}w_{002}+u_{00k}\\ &&\dots\\ &&\beta_{12k}&=\gamma_{120}+\gamma_{121}w_{121}+\gamma_{122}w_{122}+u_{12k} \end{align}$$

In three-level models, level-3 coefficients are fixed (population parameters), lower-level coefficients are random.

Related Question