Propensity Scores – Solving the Propensity Score Paradox and Applying Propensity Score Matching

paradoxpropensity-scores

Propensity score paradox and propensity score matching

I went over the papers by King and Nielsen (2017) and Ripollone et al (2018) to figure out what is propensity score paradox in propensity score matching. I am confused with how they demonstrate the propensity score paradox with pruning:
Quote “For each fully matched data set, matched pairs were ranked in order of decreasing absolute propensity score distance or Mahalanobis distance, and the matched pair with the largest distance was pruned from the data set. Covariate balance was assessed for the remaining data set, then the matched pair with the largest distance in the remaining data set was pruned, and covariate balance was assessed again. This process was repeated until only a single matched pair was left in the data set.”

It sounds like we start with a matched data set. From there, we will start to drop out matched pairs with the largest distance (e.g., remove first 30 observations with the largest distances each time) to observe paradox. Why do not we use different caliper sizes in matching to prune the data? E.g., set caliper size=1/2*standard deviation of PS, 1/4 *std of PS, 1/8, 1/16,…..

Best Answer

Removing the farthest pair and second farthest pair, etc., one at a time, is essentially the same as applying a tighter and tighter caliper. Instead of describing their analysis as using a caliper, they consider the number of units dropped, since the specific size of the caliper is arbitrary, and for a given range of calipers, there may be no change in the given sample. For example, if the distance between the members of the farthest pair is .5, and the distance between the members of the next farthest pair is .4, you don't need to check every single caliper between .5 and .4; you can just discard the next farthest pair.

The R package MatchingFrontier allows you to assess the propensity score paradox directly in a matched dataset; see the documentation for makeFrontier.matchit() and the accompanying vignette. The documentation shows an example of plotting the relationships between balance and sample size and between balance and the caliper width, which I copy below.

The plots contain exactly the same information, just with a different scaling of the x-axis. (Note when the caliper is used, the units are in standard deviations of the propensity score, just like in your example.) Both reveal the propensity score paradox, which is active in the dataset used, though only when dropping about 60 pairs or more, or, equivalently, using a caliper less than .1 standard deviations of the propensity score.

Another reason to not focus so strongly on the caliper is that you can perform an analysis analogous to the one in King and Nielsen (2019) even when not matching on a propensity score, which is what they do in the analyses of the real datasets. When matching on the Mahalanobis distance, it doesn't make sense to talk about increase the caliper on the propensity score; that is not the relevant measure and a propensity score might not have been estimated at all. The relationship between balance and remaining sample size, which is the primary characteristic defining the propensity score paradox, can be assessed no matter how the distances between units were computed.

So, to summarize, when a propensity score us used for matching, dropping the farthest pair and then the second farthest pair, etc., is the same as applying a tighter and tighter caliper. It is not described that way because calipers are usually used on the propensity score, but the analysis can be performed no matter what the distance measure is, so it makes more sense to focus the analysis on the number of units remaining.

Note that redoing the matching with a tighter and tighter caliper will only yield the same path as dropping the farthest then second farthest pairs, etc., when matching is done in ascending order of distance, which can be done in MatchIt by setting m.order = "closest".

Best Answer

Related Solutions

Stata – Propensity Score Matching with Panel Data: Methods and Applications

Solved – Conducting Analysis after Propensity Score Matching

Related Question