Propensity Scores – Setting a Limit on Number of Reuses in Propensity Score Matching with Replacement

propensity-scores

We are using a propensity score model to built a control group in an observational study. We have tried several options and a 1 to 5 matching ratio with replacement ensures the best covariate balance.

Using replacements leads to high level of reuse. The treated group size in 2200 and the most reused individual in the matched control group is reused 42 times. Here is the distribution of number of uses of each individual in the control group:

enter image description here

I am concerned about this rate because these are survey data where the assignment of the treatment variable is prone to measurement error. Although the question was quite clear, it is possible that the highly reused individuals, people who have high propensity scores but are not treated, just did not understand the question properly, were tired and wanted to stop the survey, etc.

We have tried a reuse limit of 5 but it significantly impacts covariate balances.

We failed to find relevant advises in the literature, where covariate balance and common support are the two main worries.

  1. Should we be worried about the high reuse rate? We know it will increase the variance, but does it also make the matching less reliable?
  2. Are there any guidelines as to how to balance between number of reuse and covariate support, ie. how to chose a ceiling for the number of reuse?

Best Answer

Sadly I don't have good news. (Also, I'm assuming you're using the reuse.max option in MatchIt, in which case I'm glad that feature came in handy!) The number of reuses has not been discussed in the matching literature at all. Most applications either consider matching with replacement and no limit on reuses or matching without replacement. There are no other guidelines on this matter except generally to balance the precision of the remaining sample and its covariate balance.

Balance and variance are the main issues to be concerned about. You can examine the effective sample size (ESS) of the resulting matched sample to use as a guide for how precise the resulting estimate is. Ideally, you would do a power-analysis to determine how small this value can be before your sample size is too small to reliably detect the effect you believe to be there. If it is impossible to achieve balance with such an ESS, then your sample just doesn't contain enough information to validly estimate the causal effect without additional assumptions (e.g., of the functional form of the outcome model).

You might also consider using different matching methods. Template matching (method = "cardinality", ratio = NA in matchit()) can be a nice alternative if you are intent on retaining the ATT as your estimand. Full matching (method = "full") can be another good method that retains many of the advantages of matching with replacement but doesn't require manually figuring out this tradeoff.