Solved – Is balanced data assumption necessary for linear mixed effect model for repeated measurement

missing datamixed modelrepeated measuresunbalanced-classes

I am working on using linear mixed effect model to analysis the repeated measurement for different subjects in a longitudinal study. After I read some article and paper, I did not find any place discuss that can some measurement is missing for some subjects? In the lmer() function, it can use all the observations, even some subjects have missing measurement. But someone told me that if some measurement is missing from one subject, that subject should be removed from analysis. Is this true?(When I look the formula for linear mixed effect model, I did not find a reason that we should remove subject has missing measurement). If I can use all the subjects have missing measurement, is the fitted model have any disadvantage from a fully balanced study,i.e., every subject have same measurement times?

Best Answer

There are two separate topics here: Firstly, what should one do about missing longitudinal data. Secondly, what does the particular model do and what are the assumptions behind that.

Let's start with the second one. Random effects models (whether done using lmer or PROC MIXED in SAS or some other software) can automatically handle missing data for you when there is a random subject effect, because that random subject effect together with any fixed model effects allows you to predict the missing values. This is something the model implicitly does and take into account. This happens under the assumption that the missing data are missing at random (MAR). This means that there are no unobserved data that explain the missingness (e.g. it would be fine, if subjects decide to quit the study based on the data you have observed, but not if they do it due to the value you would have observed, but did not). What a standard random effects model for repeated measures does also assumes that the future values would occur under the same conditions (e.g. if subjects were taking some treatment, then for the missing values that get implicitly imputed it is assumed that they continue on this treatment).

Regarding the first topic:

  • Deleting all subjects with some missing data ("listwise deletion") is only valid under an assumption of missing completely at random (MCAR), which is an incredibly strong assumptions that is hardly ever applicable (usually only when data are missing, because some data got lost, not when it is due to something that happened to the subject under observation). In short, it is hardly ever the right thing to do.
  • Doing something that is valid under MAR (e.g. mixed models for repeated measures, multiple imputation of some form etc.): It is more often plausible that the missing data mechanism results in data being MAR, but as explained above, you may or may not be interested in the question that will be answered (the estimand). One major difficulty is that while you can check from the data whether MCAR may apply or not, you cannot really tell whether when MCAR does not apply you can assume MAR or not. Any analysis approach that is valid under MAR is also valid under MCAR, but analyses that are valid for MCAR (such as listwise deletion) are not typically valid under MAR.
  • Making some missing not at random (MNAR) assumptions: There is a huge number of missing data mechanisms that full into this category (everything that is not MAR or MCAR).

Whether you assume MAR or MNAR also partially depends on what question you try to answer, something that is for example currently being discussed in great detail for drug trials with a planned addendum to the guideline on statistical principles for clinical trials and with lots of journal discussions e.g. here, here and here. One useful webpage is http://www.missingdata.org.uk/.