Mathematical Statistics – Understanding the Mathematical Expression of Interaction Models

lme4-nlmemathematical-statisticsmixed model

I am trying to understand how to write mathematically a mixed effect model.

The model is (I want to check the growth rate of rats in a different countries across the variable day adding the interaction of the 2 factor level Country to the growth rate):

lmer(Volume ~ Country * Day + (1 | Rat.ID))

In R I extracted the equation by using the function extract_eq() for broom.mixed package, obtaining:

$$
\begin{aligned}
\operatorname{lVolume}_{i} &\sim N \left(\alpha_{j[i]} + \beta_{1}(\operatorname{Day}), \sigma^2 \right) \\
\alpha_{j} &\sim N \left(\gamma_{0}^{\alpha} + \gamma_{1}^{\alpha}(\operatorname{Country}_{\operatorname{England}}) + \gamma_{2}^{\alpha}(\operatorname{Day} \times \operatorname{Country}_{\operatorname{England}}), \sigma^2_{\alpha_{j}} \right)
\text{, for Rat.ID j = 1,} \dots \text{,J}
\end{aligned}
$$

But I don't think it is correct since it's a bit confusing the $i$ and $j$ and the interaction coefficient and the meaning of the $\gamma$ variable.

So my first attempt was I considering the effect cofficents and the interaction term :

$$ Y = \alpha + \beta_1*Day + \beta_2*Day*Country$$

Update

I tried to understand the model again, and arrived to this:

$$ y_{ij} = \beta_0 + \beta_{0CountryA} + 1I_{Country_i = CountryA}+
\beta_{1CountryA} \times Day_{ij} \times 1I_{Country_i = CountryA}+
\beta_{1CountryB} \times Day_{ij} \times 1I_{Country_i = CountryB}+
\eta_{i0} +e_{ij}$$

where every rat i and day j with observations y (Volume).
Is this correct?

But still, here I don't have a representation of the model in terms of expected values and variances.

Could I have some help/hints to understand this, please?

Thanks

Best Answer

The confusion in interpreting the coefficients reported by the extract_eq() function of the equatiomatic package probably comes from the need of the package to handle a large number of model structures. The symbols reported by the extract_eq() function can be seem unnecessarily complicated in this simple case, but the function needs to handle a wide variety of models.

The main trick here is to recognize that only the rats, represented by $j$ in the extract_eq() output, have random effects in your case. Any term that omits $j$ is fundamentally part of the fixed-effect model, even if extract_eq() happens to represent it as part of a random effect. Any term independent of $j$ in the mean value presented by extract_eq() as part of a random effect can be moved out of the random effect and into the fixed-effect part of the model.

Let's start with your own "first attempt" model but write out all the terms and adjust the symbols of some coefficients to show how they correspond to the output from extract_eq():

$$ Y_i = (\gamma_0^{\alpha} +\alpha_{i}) + \beta_1 D_i + \gamma_1^{\alpha} C_i + \gamma_2^{\alpha} D_i C_i + \epsilon_i$$

where $i$ is an index for the individual observations , $\gamma_0^{\alpha}$ is an overall intercept, $\alpha_{i}$ is the extra intercept to be modeled as a random effect (depending on which rat $j$ is being observed in observation $i$, or $j[i]$), $D$ represents Day, $C$ is Country (0 for the reference country and 1 for the other choice, England), and $\epsilon$ is random error. The $\epsilon$ values are assumed to be distributed normally with the variance $\sigma^2$ reported in the first term by extract_eq().

For now, ignore the superscript $\alpha$ on some of the coefficients. That's just for identification of where the coefficients come from (explained below).

In your mixed-effects model, the random effects will all be incorporated into $\alpha_i$. The value for observation $i$ depends on the rat $j$ being observed; extract_eq() represents that as $\alpha_{j[i]}$.

The mean value in the first term reported by extract_eq()

$$N \left(\alpha_{j[i]} + \beta_{1}(\operatorname{Day}), \sigma^2 \right) $$

includes a term, $\beta_{1}(\operatorname{Day})$, that is independent of the (rat) random effect indexed by $j$ and can thus be removed into the fixed-effect part of the model, as I did above to give the $\beta_1 D_i$ term. Furthermore, in your case, $\sigma^2$ is independent of $j$ and thus is part of the model for the fixed effects.

Although the coefficients in the output from extract_eq() look complicated, that's because they include information about which higher-level coefficients are being modeled (allowing the function to handle more complicated models). For example, the $\alpha$ superscript in the $\gamma_1^{\alpha}$ coefficient for Country just means that this $\gamma$ coefficient is part of the model for the overall model intercept $\alpha$; similar interpretation holds for the superscript $\alpha$ in $\gamma_0^{\alpha}$ and $\gamma_2^{\alpha}$.

In your particular case, the $\gamma_0^{\alpha}$, $\gamma_1^{\alpha}$ and $\gamma_2^{\alpha}$ coefficients are constants independent of rat $j$, so you can remove their associated terms from the mean value $$\gamma_{0}^{\alpha} + \gamma_{1}^{\alpha}(\operatorname{Country}_{\operatorname{England}}) + \gamma_{2}^{\alpha}(\operatorname{Day} \times \operatorname{Country}_{\operatorname{England}})$$ reported by extract_eq() for $\alpha_{j}$ and put them into the fixed-effect part of the model. I did that above to provide the $\gamma_0^{\alpha}$, $\gamma_1^{\alpha} C_i$ and $\gamma_2 D_iC_i$ terms in my proposed model.

That leaves only the following for modeling the random effect $\alpha_{j}$:

$$\alpha_j \sim N(0,\sigma_{\alpha_j}^2).$$

That's just a normal distribution with mean 0 and variance $\sigma_{\alpha_j}^2$. So in the model for $Y_i$ that I proposed initially, the only random effect is $\alpha_{i}$. For the rat $j$ evaluated in observation $i$ ($j[i]$), that has the specific value $\alpha_{j[i]}$.

So in the model I initially wrote, the fixed effects are: $\gamma_0^{\alpha}$ for the expected value when $D=0$ (Day of 0) and $C=0$ (the reference Country); $\beta_1$ is the extra change per Day when $C=0$; $\gamma_1^{\alpha}$ is the extra contribution of Country England over the reference Country at Day= 0; $\gamma_2^{\alpha}$ is the extra contribution of $D$ and $C$ when Day isn't 0 and Country is England. The random intercept effects $\alpha_j$ specific to individual rats $j$ have a variance $\sigma_{\alpha_j}^2$. Those random effects are thus for hypothetical differences about the overall intercept $\gamma_0^{\alpha}$ among rats at Day 0 in the reference Country. Finally, there is additional variance $\sigma^2$ about all of those individual contributions to the model.