Solved – Accounting for both within subjects and between subjects mixed model

lme4-nlmemixed modelrepeated measures

I have an experiment where I have several subjects that I am analyzing a response for (call this RESPONSE). I am interested in the overall effect of temperature on RESPONSE. RESPONSE is measured once daily for each subject over the course of several weeks. Each subject also belongs to one of two levels of a factor (call this FACTOR). I want to know if the relationship between temperature and RESPONSE differs by factor.

This is longitudinal data measuring a response repeatedly through time on each individual subject. Therefore, I analyze this with a mixed model using lmer in lme4. The model specification looks like this…

Model <- lmer(RESPONSE ~ Temperature + FACTOR + doy + TemperatureFACTOR + FACTORdoy + (1 + doy | subject), data = dat, REML=TRUE)

In this model, doy is the day of the year to account for the fact that the effect is likely to vary through time due to processes occurring within the subject environment.

I am interested in the overall effect of temperature on the response. The way this model is set up, I believe it is looking at temperature within each subject only. The image above shows the relationship between temperature and response for one level of FACTOR. You can see that the overall relationship is positive, and a linear regression indicates a highly significant relationship. However, if you look within subjects (graph is color coded by subject, total of four subjects), the relationship is actually slightly negative. It is this slightly negative relationship that the model picks up on, reporting a negative coefficient for this level of FACTOR. I do understand these are not independent observations, so a linear regression is not appropriate. However, it still seems like this should be an overall positive relationship. Is there any way to specify the model so that it accounts for both the within subjects effect of temperature and the between subjects effect of temperature?

Best Answer

Your original model:

$Y_{si} = \beta_0 + S_{0s} + (β_{1} + S_{1s})X_{1si} + β_{2}X_{2si} + β_{3}X_{3si} + β_{4}X_{1si}X_{2si} + β_{5}X_{2si}X_{3si} + \epsilon_{si}$ where $s = 1,..., S$, indicates the subject, $i=1,..I_s$ indicates the measurement, $X_{1si}$ is day of year, $X_{2si}$ is factor and $X_{3si}$ = temperature, $\epsilon_{si} ~ N(0, σ^2)$ and $(S_{0s} S_{1s})'= N\left((0,0)', \left(\matrix{\sigma_1^2& \sigma_{12}\\ \sigma_{12}&\sigma_2^2}\right)\right)$. $\beta_0,...\beta_5$ are fixed effects.

For $X_{1si}$, it is 1 for Jan 1, xxxx, and 365 (or 366) for Dec 31, xxxx? If it is true, maybe periodic function is needed, or need to drop it, because the difference between means of $Y{si}$ at Jan 1, 2016 and Dec 31, 2015 is $365\beta_1$ and it may be not true.

I think your random slope should be on $X_{3si}$, instead of on $X_{1si}$ Maybe you can fit a model like this $Y_{si} = \beta_0 + S_{0s} + β_{1}X_{1si} + β_{2}X_{2si} + (β_{3}+S_{3s})X_{3si} + β_{4}X_{1si}X_{2si} + β_{5}X_{2si}X_{3si} + \epsilon_{si}$

Obviously, it is an exploratory analysis. You need to find the model that fit the data. My experience is fit several fixed effect models (linear models) with temperature alone and with other covariates, even the interactions. If you cannot find any model as you expect, maybe your theory is incorrect. If you find what you want, try to add the random effects in the model, such that the final model will be more reasonable.

In mixed model (in matrix),

$Y = X\beta + Z\gamma + \epsilon$, where $\gamma ~ N(0, G)$ and $\epsilon ~ N(0,R)$. For a given $X$, the variance-covariance of $Y$ is

$Var(Y) = ZGZ'+R$

Generally, we are not interesting in the random effect, instead we want to estimate the fixed effect $\beta$. The purpose of including random effect in the model is to make sure the model is more suitable to the real situation when the correlation exists among the response variable. If $Z$ has many columns with complicated structure, it is difficult to figure out what $ZGZ'$ looks like. It means you do not know what model you are fitting. Theoretically, you can have many continue variables in $Z$, but in practice, it is difficult to explain when you have two or more continue variables in $Z$.

Another method is get rid of random effect, and specify the variance-covariance matrix directly though $R$. When the variance-covariance structure is clear, this method is better than random effect.

In your case, if you think that temperature has effect on the correlation, for example, the two measurements from the same subject have higher correlation if the the temperatures are close, you can specify the $R$ though difference of the temperature, such as $\rho^{|t_i-t_j|}$.

Related Solutions

Solved – Combining between and within-subjects designs

Don't try to come up with a complicated analysis to deal with this, just do within and between analyses separately on the relevant subsets of your data. That way you will not compromise the interpretability of the outcome and you will be able to make sure that the result is both sensible and robust.

Depending on the nature of your data you could potentially combine the outcomes of the two arms of the experiment. For example, if you can generate likelihood functions for each arm then the combined function is simply their product.

Solved – Maximal model for linear mixed-effects model for repeated mesaures design

The maximal structure would need to include also a random effect for the interaction between color and shape, that is:

Y ~ color * shape + (color + shape + color:shape | subject)

This will result in all your predictors (color, shape and their interaction) having a fixed effect (constant for all subjects), and a random effect (individual fluctuations around the estimated fixed effect). In this sense the model is the maximal one. Note that it might not be fully equivalent to a repeated-measures ANOVA as it doesn't make equally strict assumptions on the correlational structure (see Tom's answer).

If you don't include the interaction in the random effect part of the formula, individual variation in the interaction effect will not be considered as "random", and the model will not be equivalent to a repeated-measures ANOVA. Of course, the variance of the random deviates for the interaction (or any other random effect) might be so small that including it in the model do not improve much the fit. You can check this not only with the AIC, but with a likelihood ratio test, as model with vs without one random effect are nested one another. In principle if the likelihood ratio test is not significant, it means that you can safely remove that random effect. Simplifying the random effect structures by removing negligible components would be an example of what in the article you linked is called data-driven approach.

You can simplify the model in this way, and it would still be equivalent to a repeated-measures ANOVA:

Y ~ color*shape + (1|subject) + (0+color|subject) + (0+shape|subject) + (0+color:shape|subject)

This syntax tells lmer to not estimate the correlations of random deviates across subjects. The drawback here is that, for example, you won't be able to tell whether subjects that have a large effect of color tend to have also a larger effect of shape (or smaller effect, in case of negative correlation).

You can easily include a between-subjects predictor, the only difference is that you can't add a random effect for it. "gender" for example cannot have a random effect grouped according to subject, but it can interact with the other fixed effects, e.g.:

Y ~ color * shape * gender + (color + shape + color:shape | subject)

Best Answer

Related Solutions

Solved – Combining between and within-subjects designs

Solved – Maximal model for linear mixed-effects model for repeated mesaures design

Related Question