Propensity Score Matching – Different Results After Applying in R

k nearest neighbourmatchingpropensity-scores

I have conducted Prospensity Score Matching (in R using the R-package "Matchit"). I used the matching method "nearest neighbor". After matching I compared the treatment and the controlgroup in terms of their outcome variable. For this comparison I used t-test. I discovered that after each matching procedure the results of the t-test changed. To test my assumption that this change in results was due to random selection of the propensity scores (that are used for the nearest neighbor matching) I set the random number generator to a specific seed and conducted the matching procedure several times. By setting the RNG the results didn't differ anymore.

  1. Confronted with different results after every matching procedure: how do I decide which matching solution I use for further analysis? Is it a valid method to conduct the matching prodecure several times (say 10'000) and report the median of the p- and t-values of the results I get from the several t-tests?

Best Answer

This happens when you have (at least) two individuals that have the same propensity score. MatchIt randomly selects one to include in the matched set. My recommendation would be to select one matched set and carry out your analysis with it. I agree that trying other conditioning methods such as full matching and IPW would be a good idea. You could report results of various analyses in a sensitivity analysis section.

Edit: This is probably the wrong answer. See Viktor's answer for what is likely the actual cause.

Edit 2020-12-07: For MatchIt version less than 4.0.0, the only random selection that would occur when nearest neighbor matching was when ties were present or when m.order = "random", which is not the default. If few variables were used in matching, and especially if they were all categorical or took few values, ties are possible. As of version 4.0.0, there are no longer any random processes unless m.order = "random"; all ties are broken deterministically based on the order of the data.