Solved – multiple imputation and propensity scores

multiple-imputationpropensity-scoresr

I have a dataset with 1300 observations and 30 variables. One of the variables has 10% missing data, another has 5% and a third has 3%. Seeing Propensity score matching after multiple imputation I created an averaged propensity score based on the imputed data from MICE. Also based on the work by Mitra et al .

Now I need to

do the matching procedure – preferably I´d like to use MatchIt, but it does not allow the propensity score to be pre-generated which is needed because of the averaging in the prior step. I can do it in Matching – However, I´m not very familiar with it. Does anyone know if a pre-generated propensity score can be used in MatchIt?
I need to supply some proof that the matched groups are similar with regards to the covariates used in the generated propensity score. However, how can this be done is uncertain since the generated propensity score is based on multiple imputed datasets? Would it be OK just to average the imputed values and generated an 'averaged' imputed dataset and use this for the balance check?

Best Answer

My understanding is that you should generate individual propensity score models for each data set, then match, then estimate outcomes, then combine the estimates into one.

1) Match() in Matching accepts a user's own propensity score (include it as the X parameter in the call to Match(). matchit() in MatchIt does the same. I also recommend you try propensity score weighting; the package twang allows users to enter their own propensity scores/weights and then assess balance. The twang vignette explains how to do this and estimate a treatment effect.

2) Typically for balance assessment reporting, you assess balance on each dataset individually and then report maximum imbalance for each covariate across the imputed data sets. Do not average your imputed data sets. If across the imputed datasets the maximum imbalance of each covariates is within an acceptable range (e.g., ASMD <.1), that should be good enough evidence that you have achieved balance and can move forward.

Related Solutions

Solved – Propensity score matching after multiple imputation

The first thing to say is that, for me, method 1 (sampling) seems to be without much merit - it is discarding the benefits of multiple imputation, and reduces to single imputation for each observation, as mentioned by Stas. I can't see any advantage in using it.

There is an excellent discussion of the issues surrounding propensity score analysis with missing data in Hill (2004): Hill, J. "Reducing Bias in Treatment Effect Estimation in Observational Studies Suffering from Missing Data" ISERP Working Papers, 2004. It is downloadable from here.

The paper considers two approaches to using multiple imputation (and also other methods of dealing with missing data) and propensity scores :

averaging of propensity scores after multiple imputation, followed by causal inference (method 2 in your post above)
causal inference using each set of propensity scores from the multiple imputations followed by averaging of the causal estimates.

Additionally, the paper considers whether the outcome should be included as a predictor in the imputation model.

Hill asserts that while multiple imputation is preferred to other methods of dealing with missing data, in general, there is no a priori reason to prefer one of these techniques over the other. However, there may be reasons to prefer averaging the propensity scores, particularly when using certain matching algorithms. Hill did a a simulation study in the same paper and found that averaging the propensity scores prior to causal inference, when including the outcome in the imputation model produced the best results in terms of mean squared error, and averaging the scores first, but without the outcome in the imputation model, produced the best results in terms of average bias (absolute difference between estimated and true treatment effect). Generally, it is advisable to include the outcome in the imputation model (for example see here).

So it would seem that your method 2 is the way to go.

Solved – Manipulating data for propensity score matching following multiple imputation with mice package

Your second method, using complete(imp, "long") is what you need to do. You've probably read the Mitra & Reiter paper distinguishing between the "Within" and "Across" methods, since you mentioned wanting to average propensity scores. With imputed outcome data, you need to use the "Within" approach, and then combine and adjust your estimates using Rubin's rules. So, for each imputed data set, you will do the following: run matchit(), and then run your outcome regression model on the matched data set. Once you have done this for each imputed data set, you will recombine the model fits using as.mira() and then summarize your mira output. Here is some sample code:

imp.data <- complete(imp, "long")
fit.list <- setNames(vector("list", length(unique(imp.data$.imp))),
                     unique(imp.data$.imp))
for (i in unique(imp.data2$.imp)) {
  m.out <- matchit(t ~ v1 + v2 + v3, data = imp.data[imp.data$.imp == i,])
  fit.list[[i]] <- glm(y ~ t, data = match.data(m.out))
}
fit.list.mira <- as.mira(fit.list) #combines into mira object for pool()
summary(pool(fit.list.mira))

Hope this helps!

Best Answer

Related Solutions

Solved – Propensity score matching after multiple imputation

Solved – Manipulating data for propensity score matching following multiple imputation with mice package

Related Question