Solved – Multilevel covariance structure and more

covariance-matrixlme4-nlmemultilevel-analysisr

I have some doubts about the covariance structure in a multilevel model fitted in R (using the nlme package). I'm not an expert (just starting to learn statistics…), so I apologize if some of my questions seem evident. I've checked the previous posts and haven't found an answer.

I have data from an experiment in which we have registered physiological data from 30 subjects in 2 conditions (with 30 trials in each condition). These 30 trials are close in time, and we expected a higher correlation between closer trials, with the correlation decreasing as trials are further apart from each other. We are interested in the effect of condition, not in time effects. I think that the right way to analyze these data is to fit a multilevel model, in which TRIAL is a level 1 variable, SUBJECT is a level 2 variable, CONDITION is a fixed factor, and the DV is the physiological response (FR). The R command I'm using is:

lme(fixed= FR ~ CONDITION, data=mydata, random= ~ TIME | SUBJECT)

My (many) questions are both theoretical and practical:

  • Which covariance structure lme does use by default? Is it a problem not to use the most appropriate covariance structure?

  • I've read that an autoregressive covariance structure (AR1) refers to a constant variance at each time point and a weaker correlation as time points get further apart. My data only meets the second criterion. How can I know which covariance structure is right for my data? How important is it to the validity of the results?

  • I'm only interested in the CONDITION effect, which is significant when I'm not including TIME in the model. I'm not interesting in TIME effects, nor I want to use the model to make a prediction, but only to check the significance of the CONDITION effects. Is it correct if I drop out TIME and fit a model only with CONDITION?

Thank you for your help!

Best Answer

I'm not sure I can provide the kind of answer I'd like to, but I will try to throw out some pieces of information regarding your questions.

First, both @Seth and @gui11aume (+1 to each) have noted that lme() defaults to no within group correlations. The question is why, and whether that's likely to be a problem. I believe that the thinking is a properly specified multilevel model will account for the covariance amongst your observations such that the residuals are independent. That's why the function was coded to expect no correlations. That is, you may be OK.

Several of your questions concern the effect of having a misspecified variance/covariance structure (bearing in mind that this may not actually apply to you). The estimation of your betas should be unaffected by this, that is, they should be unbiased. However, the estimation of the variance of the sampling distributions will be inaccurate, that is, your p-values will be inaccurate. Moreover, I believe that you cannot say a-priori whether they will be too high or too low. If you are really concerned about these issues you can always use robust (a.k.a., 'sandwich') standard errors. These are typically thought about in the context of generalized linear models, but they can be used elsewhere. Check out the R package sandwich. Note that if they are not necessary, you could be at risk of increased type II errors.

The standard AR(1) variance/covariance structure does assume homoskedasticity, so far as I know. More restrictive, however, is that it assumes every observation was made at the appropriate time, and that all measurements are equally spaced in time. These assumptions usually don't hold, even in the most fortuitous situations, and as such, the AR(1) variance/covariance structure is dangerous to assume.

Remember that the proper specification of the model for the means is crucial. It is remotely possible that time is not relevant to the appropriate model of the mean, but it isn't very likely at all. Leaving TIME out of the model risks the omitted variable bias. Thus, dropping TIME is likely to yield both biased estimates of the means and invalid inferences. This is just not worth gambling on.