Solved – Using propensity score AND exact matching for control group selection

matchingpropensity-scores

I am working with a team of researchers looking to select a control group out of a large population (150,000) to compare with a relatively small treated group (~900). We plan on using a propensity score to match similar treated members to potential control group members, but also exactly match on other factors (i.e. gender). That is, for a group of criteria that are relevant for matching, we have a portion that will be included in the propensity score, and the remainder will be matched exactly. So I have two questions:

  1. Should we create the propensity score (logistic regression) before exactly matching on certain criteria, and use the score as just one matching criterion along with the exact criteria, OR should the scores be created for each group only after exactly matching on certain criteria?

  2. If the propensity score is formed before exact matching takes place, should the exact matching criteria also be in the propensity score? This question arose when members on the team expressed concern that the significance of the exact matching criteria might be higher than the other criteria included in the propensity score. SAS examples of the PSMATCH procedure also show that exactly matched criteria are included in the PS model in their examples (here: https://documentation.sas.com/?docsetId=statug&docsetTarget=statug_psmatch_examples04.htm&docsetVersion=14.3&locale=en)

Any further insight, readings, or suggestions would be helpful. Thanks.

Best Answer

Performing the PS estimation separately within each of the groups defined by the exact matching variables is equivalent to estimating a propensity score in the whole sample with the exact matching variables and their interactions with all other covariates included in the PS model and then splitting your data based on the exact matching variables. Remember that the goal of PS matching is covariate balance; use whichever method yields good covariate balance without requiring you to discard too many treated units.

As I mentioned in this post, omitting an important covariate in a propensity score model, regardless of whether you exactly match on that covariate, can yield poorly performing propensity scores that are far from the true propensity scores (because they are estimated with an incorrect, underspecified model). All though with exact matching you'll achieve perfect balance on the exact matching variables, poorly estimated propensity scores may fail to balance your other covariates. But this is an empirical question that depends on your dataset at hand; you should try various ways of estimating the propensity score and matching and pick the one that yields the best balance and remaining sample size.