Solved – Moderation in repeated-measures design

interactionpanel dataregressionrepeated measures

Context: Both dependent $(Y_1,~Y_2,~Y_3)$ and independent $(X_1,~X_2,~X_3)$ variables were measured repeatedly at three time points, $\text{Time}_1$, $\text{Time}_2$, and $\text{Time}_3$. Moderator $M$ is a continuous variable measured on one occasion, and is hypothesized to be unchanged throughout all three time points.

Preliminary analyses: I ran moderated regression cross-sectionally for each time point, and interaction between $X$ and $Z$ is only significant at $\text{Time}_2$. This is because theoretically $\text{Time}_2$ has a unique characteristic that makes it different from $\text{Time}_1$ and $\text{Time}_3$. We believe that this characteristic made a difference in whether $Z$ played a moderating role in the relationship between $X$ and $Y$.

Question: It appears that the relationship between $X$ and $Y$ not only depends on $M$, but also the measurement occasion/condition. I'm guessing there might be another level of interaction, but how do I test that when this additional interaction is occurring on a repeated-measures design?

Best Answer

Let's imagine a bit of data for three participants. In a "wide" format, it might look like this:

$$ \begin{bmatrix} y_1 & y_2 & y_3 & x_1 & x_2 & x_3 & m & id \\ 1 & 2 & 3 & 5 & 5 & 2 & 7 & 1 \\ 3 & 2 & 2 & 4 & 5 & 6 & 6 & 2 \\ 2 & 2 & 3 & 4 & 5 & 2 & 8 & 3 \\ \end{bmatrix} $$

From here, it is easy to do a linear regression at any particular time point. For example:

$$ \begin{eqnarray} y_{1i} &=& b_0 + b_1 * x_{1i} + e_i \\ y_{2i} &=& b_0 + b_1 * x_{2i} + e_i \\ y_{3i} &=& b_0 + b_1 * x_{3i} + e_i \\ \end{eqnarray} $$

and to look at whether the effect is moderated by $m$:

$$ \begin{eqnarray} y_{1i} &=& b_0 + b_1 * x_{1i} + b_2 * m_i + b_3 * (x_{1i} * m_i) + e_{1i} \\ y_{2i} &=& b_0 + b_1 * x_{2i} + b_2 * m_i + b_3 * (x_{2i} * m_i) + e_{2i} \\ y_{3i} &=& b_0 + b_1 * x_{3i} + b_2 * m_i + b_3 * (x_{3i} * m_i) + e_{3i} \\ \end{eqnarray} $$

This works, and allows unique effects at all time points, that are easily countable. Each model has four coefficients, plus a unique residual variance ($\sigma^2 = \mathbf{e^{T}e}$), and there are three models, so $(4 + 1) * 3 = 15$ total unique parameters. That allows unique effects everywhere, but is not taking advantage of modeling everything at once. To do that, let's reshape the data into so-called "long" form.

$$ \begin{bmatrix} y & x & m & time & id \\ 1 & 5 & 7 & 1 & 1 \\ 2 & 5 & 7 & 2 & 1 \\ 3 & 2 & 7 & 3 & 1 \\ 3 & 4 & 6 & 1 & 2 \\ 2 & 5 & 6 & 2 & 2 \\ 2 & 6 & 6 & 3 & 2 \\ 2 & 4 & 8 & 1 & 3 \\ 2 & 5 & 8 & 2 & 3 \\ 3 & 2 & 8 & 3 & 3 \\ \end{bmatrix} $$

So we have the same data essentially, but now all the $y$s are in one column, and all the $x$s are in one column, plus we gained another variable indicating to what time point each $y$ and $x$ belong. We can estimate this using a multilevel (also called mixed effects) model, but let's start off just using a regular linear model.

$$ y_{ij} = b_{0} + b_{1}*time_{ij} + b_{2} * x_{ij} + b_{3} * m_i + b_{4} * (x_{ij} * m_i) + e_{ij} $$

Here I am using $i$ to indicate the id, and $j$ the time point. How many parameters do we have now? We are down to 6. Where did the rest go? Well we are assuming a constant over time effect of $x$, so instead of 3 unique parameters, we only have 1. Likewise for $m$, their interaction, and we are assuming only one residual variance. So that saves us 8 parameters so far. We also assume one intercept, but here are allowing for a linear time effect, which means we use two parameters, where before we had three intercepts, gaining 1 additional degree of freedom. What if we do not know the time effect and are concerned it is not linear? We could do many things, but one option with only three time points is to dummy code time.

$$ \begin{bmatrix} y & x & m & time & td_1 & td_2 & td_3 & id \\ 1 & 5 & 7 & 1 & 1 & 0 & 0 & 1 \\ 2 & 5 & 7 & 2 & 0 & 1 & 0 & 1 \\ 3 & 2 & 7 & 3 & 0 & 0 & 1 & 1 \\ 3 & 4 & 6 & 1 & 1 & 0 & 0 & 2 \\ 2 & 5 & 6 & 2 & 0 & 1 & 0 & 2 \\ 2 & 6 & 6 & 3 & 0 & 0 & 1 & 2 \\ 2 & 4 & 8 & 1 & 1 & 0 & 0 & 3 \\ 2 & 5 & 8 & 2 & 0 & 1 & 0 & 3 \\ 3 & 2 & 8 & 3 & 0 & 0 & 1 & 3 \\ \end{bmatrix} $$

Now let's see what we could do in terms of a model. We will keep the intercept, and only use $k - 1$ of the dummy variables, so $3 - 1 = 2$ dummy time variables will be included.

$$ \begin{eqnarray} y_{ij} &=& b_{0} + b_{1}*td_2 + b_{2}*td_3 \\ &+& b_{3} * x_{ij} + b_{4} * (x_{ij} * td_2) + b_{5} * (x_{ij} * td_3) \\ &+& b_{6} * m_{i} + b_{7} * (m_{i} * td_2) + b_{8} * (m_{i} * td_3) \\ &+& b_{9} * (x_{ij} * m_i) + b_{10} * (x_{ij} * m_i * td_2) + b_{11} * (x_{ij} * m_i * td_3) \\ &+& e_{ij} \end{eqnarray} $$

finally, instead of assuming a constant residual variance, which looks like:

$$ \boldsymbol{\Sigma} = \sigma^2 \mathbf{I} = \sigma^2 \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{bmatrix} $$

we can assume heterogeneous structure with a separate variance for each time point:

$$ \boldsymbol{\Sigma} = \begin{bmatrix} \sigma^2_1 & 0 & 0 \\ 0 & \sigma^2_2 & 0 \\ 0 & 0 & \sigma^2_3 \\ \end{bmatrix} $$

and now we are all the way back to having 15 parameters, as in the series of separate regressions. These are two extremes (and for now I am still ignoring the lack of independence in observations due to repeated measures). If you leave any of the interactions out, it is like putting on a constraint that the parameters be equal. Hopefully these two ends help show you how you could add a focused test. For example, a model that assumes everyone starts at the same point, the effect of $x$ is constant across time, and is moderated by $m$, which also has a constant across time effect, but that their interaction is different at $time_2$ could look like:

$$ \begin{eqnarray} y_{ij} &=& b_{0} + b_{1}*td_2 + b_{2}*td_3 \\ &+& b_{3} * x_{ij} + b_{4} * m_{i} \\ &+& b_{5} * (x_{ij} * m_i) + b_{6} * (x_{ij} * m_i * td_2) \\ &+& e_{ij} \end{eqnarray} $$

Now some people might complain about that model because it does not really include all time components nor all lower order interactions. Perhaps slightly better would be to assume times 1 & 3 were similar, and just use a dummy code for the assumed aberrant time point.

$$ \begin{eqnarray} y_{ij} &=& b_{0} + b_{1}*td_2 + b_{2}*td_3 + b_{3} * x_{ij} + b_{4} * m_{i} \\ &+& b_{5} * (x_{ij} * m_i) + b_{6} * (x_{ij} * td_2) + b_{7} * (m_i * td_2)\\ &+& b_{8} * (x_{ij} * m_i * td_2) \\ &+& e_{ij} \end{eqnarray} $$

I would still tend to favor leaving all the dummy codes in as "main effects" to allow separate intercepts by time. Still by making some assumptions about the structure, three degrees of freedom are gained, and the model somewhat simplified. In this case, the test whether $b_8 \overset{?}{=} 0$ encodes the test whether the effect of the interaction of $x$ and $m$ differs between times 1 & 3 combined versus time 2.

A final note, with repeated measures, you want to account for the nonindependence in the data. This is done using random effects in mixed effects models. The simplest of which would be a random intercept.

$$ \begin{eqnarray} y_{ij} &=& (b_{0} + u_i) + b_{1}*td_2 + b_{2}*td_3 + b_{3} * x_{ij} + b_{4} * m_{i} \\ &+& b_{5} * (x_{ij} * m_i) + b_{6} * (x_{ij} * td_2) + b_{7} * (m_i * td_2)\\ &+& b_{8} * (x_{ij} * m_i * td_2) \\ &+& e_{ij} \end{eqnarray} $$

The new term is $u_i$ which like each participants' deviation from the overall intercept. Typically, because there are many subjects, we assume these follow some distribution, like:

$$u_i \sim \mathcal{N}(0, \tau^2)$$

saying that the individual subject deviations from the overall intercept have mean zero (because they are deviations from the grand intercept) and come from a normal distribution, whose variance is estimated as $\tau^2$ (note that this is separate from the residual variance, this is the variance in starting points by subject, assuming that distribution follows a normal distribution).

Related Question