Solved – Propensity score matching after multiple imputation

missing datapropensity-scores

I refer to this paper:
Hayes JR, Groner JI.
"Using multiple imputation and propensity scores to test the effect of car seats and seat belt usage on injury severity from trauma registry data."
J Pediatr Surg. 2008 May;43(5):924-7.

In this study, multiple imputation was performed to obtain 15 complete datasets. Propensity scores were then computed for each dataset. Then, for each observational unit, a record was chosen randomly from one of the completed 15 datasets (including the related propensity score) thereby creating a single final dataset for which was then analysed by propensity score matching.

My questions are: Is this valid way to perform propensity score matching following multiple imputation ? Are there alternative ways to do it ?

For context: In my new project, I aim to compare the effects of 2 treatment methods using propensity score matching. There is missing data and I intend to use the MICE package in R to impute missing values, then twang to do the propensity score matching, and then lme4 to analyse the matched data.

Update1:

I have found this paper which takes a different approach:
Mitra, Robin and Reiter, Jerome P. (2011) Propensity score matching with missing covariates via iterated, sequential multiple imputation
[Working Paper]

In this paper the authors compute propensity scores on all the imputed datasets and then pool them by averaging, which is in the spirit of multiple imputation using Rubin's rule's for a point estimate – but is it really applicable for a propensity score ?

It would be really nice if anyone on CV could provide an answer with commentary on these 2 different approaches, and/or any others….

Best Answer

The first thing to say is that, for me, method 1 (sampling) seems to be without much merit - it is discarding the benefits of multiple imputation, and reduces to single imputation for each observation, as mentioned by Stas. I can't see any advantage in using it.

There is an excellent discussion of the issues surrounding propensity score analysis with missing data in Hill (2004): Hill, J. "Reducing Bias in Treatment Effect Estimation in Observational Studies Suffering from Missing Data" ISERP Working Papers, 2004. It is downloadable from here.

The paper considers two approaches to using multiple imputation (and also other methods of dealing with missing data) and propensity scores :

  • averaging of propensity scores after multiple imputation, followed by causal inference (method 2 in your post above)

  • causal inference using each set of propensity scores from the multiple imputations followed by averaging of the causal estimates.

Additionally, the paper considers whether the outcome should be included as a predictor in the imputation model.

Hill asserts that while multiple imputation is preferred to other methods of dealing with missing data, in general, there is no a priori reason to prefer one of these techniques over the other. However, there may be reasons to prefer averaging the propensity scores, particularly when using certain matching algorithms. Hill did a a simulation study in the same paper and found that averaging the propensity scores prior to causal inference, when including the outcome in the imputation model produced the best results in terms of mean squared error, and averaging the scores first, but without the outcome in the imputation model, produced the best results in terms of average bias (absolute difference between estimated and true treatment effect). Generally, it is advisable to include the outcome in the imputation model (for example see here).

So it would seem that your method 2 is the way to go.