Solved – Missing data in repeated measures design

missing datamultiple-imputationrepeated measures

I have data from a a simple 2(voice system, manual system) x 2(easy, hard) within subjects experiment. Multiple DVs were collected.

Due to issues with both the systems tested, data for one system condition or another is corrupt (so badly that it can be considered missing) for many subjects (ss). Out of data for 34ss, only 14 have data for both system conditions. The remaining 20 have data for the voice system (9) or the manual system (11).

To list it out:

  • 14 x data for both
  • 11 x data for manual system condition, but missing voice
  • 09 x data for voice system condition, but missing manual

I am considering my options:

  • I can use a repeated measures analysis with listwise deletion of missing data and run my analysis on only the 14 with full data.

  • I can use multiple imputation to attempt to fill in for 20 missing sets of data, however, that is a lot of data to impute. I'm wondering if that is wise.

  • I can redesign the experiment into a between/within ss experiment, randomly discarding data from the 14 who have both systems dara and arriving at a balanced 22n in each newly coined 'between' group. The difficulty group would still be analyzed as within.

  • I can begin collecting again. Sadly, in repairing the facility it was also 'upgraded', meaning enough changes that I'll have to start from scratch.

Any opinions on which is the best path forward? Any paths I have not considered?

Best Answer

With missing data, deciding what to do is (almost) all about deciding what assumptions you can validly make about the missingness.

The three main options are:

  1. Missing Completely at Random (MCAR) - i.e. there are no possible predictors of missingness. In this case, you could validly drop those individuals with missing data and work with the complete observations only (this is called a complete case analysis). However, this is generally an unrealistic assumption in my experience. If you do a complete case analysis and the MCAR assumption does not hold, then any conclusions you want to draw from your results will only be valid for that specific set of people who have complete data. It is invalid to make inference about the whole population (or even whole sample) if MCAR does not hold, when using complete case analysis.

  2. Missing at Random (MAR) - in this case, you know why some individuals have missing data and others do not (there are some variables which predict missingness) but within strata of those variables the data is truly randomly missing. In this case, you can use multiple imputation or other methods such as inverse probability weighting for non-missingness (IPW). Under MAR, these methods allow you to make inference about the entire sample, and therefore about the population from which your samples arose. You are right that you have a lot of data to impute, which may cause a problem. I don't have enough experience with multiple imputation to advise how best to proceed.

  3. Not Missing at Random (NMAR) - there is a definite pattern to which data is missing and there are no variables which can be used to create strata within which the data is missing at random. If this is your situation, I believe you will need to redo the experiment as there is no valid solution I am aware of for analyzing the data as is (assuming that you want to make inference about some larger population).

To be honest, given the very high percentage of missing data, I would probably suggest starting over if that's not too costly in terms of time and money. I would try to do multiple imputation or IPW if you have variables to predict missingness, but you may not have much power with so few subjects and so much missingness. You will also need to be think about whether missingness on one measure affects missingness on the other measure - this would make the analysis more complicated.