I am evaluating pre and post test data for 2 groups using repeated measures ANOVA. Given that I am missing a few data cells, the final n is reduced in the analysis. I'd like to know if I can run separate repeated measures ANOVA for each subtest (or combinations that have the same number of full data sets) rather than include them all in the same model to maximize the data analyzed. If I ran separate ANOVAs, are there implications for my interpretation of the output? Thank you for any advice.
Solved – Repeated Measures ANOVA missing values – run separate models
anovamissing datarepeated measures
Related Solutions
It appears you're trying to run a factorial ANOVA but your predictors are not all crossed with each other, this is impossible to do. You would need to provide much more information on the design of your study to give you the best advice but you typically have one of two options here.
You can analyze all of the factors that are crossed, perhaps factor 1-3, and then factor 3-4 in two separate ANOVAs. You might be able to reach good conclusions there. It's possible you don't even really want to say anything about the interaction between 1 and 4 (for example). Or, you could just make one variable that allows you to analyze the data in a one way ANOVA. Without more details I wouldn't unequivocally recommend this. You're going to have to go through and see if there's something you need to extract from the results that cannot be addressed through the former method and whether such a design makes sense. It would mean you have one factor with levels like TTTT, FTTT, TFTT... FFFF.
With missing data, deciding what to do is (almost) all about deciding what assumptions you can validly make about the missingness.
The three main options are:
Missing Completely at Random (MCAR) - i.e. there are no possible predictors of missingness. In this case, you could validly drop those individuals with missing data and work with the complete observations only (this is called a complete case analysis). However, this is generally an unrealistic assumption in my experience. If you do a complete case analysis and the MCAR assumption does not hold, then any conclusions you want to draw from your results will only be valid for that specific set of people who have complete data. It is invalid to make inference about the whole population (or even whole sample) if MCAR does not hold, when using complete case analysis.
Missing at Random (MAR) - in this case, you know why some individuals have missing data and others do not (there are some variables which predict missingness) but within strata of those variables the data is truly randomly missing. In this case, you can use multiple imputation or other methods such as inverse probability weighting for non-missingness (IPW). Under MAR, these methods allow you to make inference about the entire sample, and therefore about the population from which your samples arose. You are right that you have a lot of data to impute, which may cause a problem. I don't have enough experience with multiple imputation to advise how best to proceed.
Not Missing at Random (NMAR) - there is a definite pattern to which data is missing and there are no variables which can be used to create strata within which the data is missing at random. If this is your situation, I believe you will need to redo the experiment as there is no valid solution I am aware of for analyzing the data as is (assuming that you want to make inference about some larger population).
To be honest, given the very high percentage of missing data, I would probably suggest starting over if that's not too costly in terms of time and money. I would try to do multiple imputation or IPW if you have variables to predict missingness, but you may not have much power with so few subjects and so much missingness. You will also need to be think about whether missingness on one measure affects missingness on the other measure - this would make the analysis more complicated.
Best Answer
You're better off doing something that allows you to do the equivalent analysis, but doesn't get upset about the missing data (a structural equation model or a multilevel model), or doing imputation first to fill in the missings.
Perhaps tell us more about your model, your data, and what program you're using to analyze it.