The first thing to say is that, for me, method 1 (sampling) seems to be without much merit - it is discarding the benefits of multiple imputation, and reduces to single imputation for each observation, as mentioned by Stas. I can't see any advantage in using it.
There is an excellent discussion of the issues surrounding propensity score analysis with missing data in Hill (2004):
Hill, J. "Reducing Bias in Treatment Effect Estimation in Observational Studies Suffering from Missing Data"
ISERP Working Papers, 2004.
It is downloadable from here.
The paper considers two approaches to using multiple imputation (and also other methods of dealing with missing data) and propensity scores :
averaging of propensity scores after multiple imputation, followed by causal inference (method 2 in your post above)
causal inference using each set of propensity scores from the multiple imputations followed by averaging of the causal estimates.
Additionally, the paper considers whether the outcome should be included as a predictor in the imputation model.
Hill asserts that while multiple imputation is preferred to other methods of dealing with missing data, in general, there is no a priori reason to prefer one of these techniques over the other. However, there may be reasons to prefer averaging the propensity scores, particularly when using certain matching algorithms. Hill did a a simulation study in the same paper and found that averaging the propensity scores prior to causal inference, when including the outcome in the imputation model produced the best results in terms of mean squared error, and averaging the scores first, but without the outcome in the imputation model, produced the best results in terms of average bias (absolute difference between estimated and true treatment effect). Generally, it is advisable to include the outcome in the imputation model (for example see here).
So it would seem that your method 2 is the way to go.
Your second method, using complete(imp, "long")
is what you need to do. You've probably read the Mitra & Reiter paper distinguishing between the "Within" and "Across" methods, since you mentioned wanting to average propensity scores. With imputed outcome data, you need to use the "Within" approach, and then combine and adjust your estimates using Rubin's rules. So, for each imputed data set, you will do the following: run matchit()
, and then run your outcome regression model on the matched data set. Once you have done this for each imputed data set, you will recombine the model fits using as.mira()
and then summarize your mira
output. Here is some sample code:
imp.data <- complete(imp, "long")
fit.list <- setNames(vector("list", length(unique(imp.data$.imp))),
unique(imp.data$.imp))
for (i in unique(imp.data2$.imp)) {
m.out <- matchit(t ~ v1 + v2 + v3, data = imp.data[imp.data$.imp == i,])
fit.list[[i]] <- glm(y ~ t, data = match.data(m.out))
}
fit.list.mira <- as.mira(fit.list) #combines into mira object for pool()
summary(pool(fit.list.mira))
Hope this helps!
Best Answer
My understanding is that you should generate individual propensity score models for each data set, then match, then estimate outcomes, then combine the estimates into one.
1)
Match()
in Matching accepts a user's own propensity score (include it as the X parameter in the call toMatch()
.matchit()
in MatchIt does the same. I also recommend you try propensity score weighting; the package twang allows users to enter their own propensity scores/weights and then assess balance. The twang vignette explains how to do this and estimate a treatment effect.2) Typically for balance assessment reporting, you assess balance on each dataset individually and then report maximum imbalance for each covariate across the imputed data sets. Do not average your imputed data sets. If across the imputed datasets the maximum imbalance of each covariates is within an acceptable range (e.g., ASMD <.1), that should be good enough evidence that you have achieved balance and can move forward.