Interpret the “standardized pair distances” (Std. Pair Dist.) in MatchIt output

matchingpropensity-scorespythonr

I would like to know how to interpret the Std. Pair Dist. (standardized pair distances) of summary() output of R's MatchIt package.

The maintainer @Noha said here that "The average difference between pairs is in the Std. pair dist. column.". Of course I assume he is correct. But I misinterpret it.

See the last column in that following example output was generated with method="nearest", distance="glm":

             Means Treated  Means Control  Std. Mean Diff.  Var. Ratio  eCDF Mean  eCDF Max  Std. Pair Dist.
distance             0.011647       0.011643         0.000624    1.009227   0.000007  0.004008         0.000678
geschlechtm          0.304609       0.312625        -0.017417         NaN   0.008016  0.008016         0.113211
geschlechtw          0.695391       0.687375         0.017417         NaN   0.008016  0.008016         0.113211
alter               69.082164      69.058116         0.002490    0.952987   0.005572  0.014028         0.164750
pflege               0.280561       0.262525         0.024109    1.077771   0.003674  0.010020         0.120543

OK, let's see. Isn't the average just the mean()?

But calculating the mean() on the matched data gives me results different from the summary table (btw: It is Python code because I use MatchIt via Pythons rpy2 package):

>>> df.Vm001.distance.mean()
0.011644758037430517

>>> df.Vm001.alter.mean()
69.07014028056112

Question's

  1. How is this value computed?
  2. What does this value mean? Or how should it be interpreted?

Best Answer

This value is computed as the average of the distance between units within a pair on the given covariate. It's easiest to think about this with 1:1 matching. Consider a matched pair, and take the difference between the treated unit in that pair's value of the covariate and the control unit in that pair's value of the covariate. Then take the absolute value of this difference to make it a distance. Then take the average of these distances across all pairs. When standardize = TRUE (the default), the average is then standardized in the same way the standardized mean differences are.

It looks like you were just taking the mean of the covariate, which is not right. Taking the mean of the distance measure (i.e., the propensity score) is not right either. You have to compute the differences within pairs between the treated and control members of each pair.

So, for example, the value of 0.164750 for alter means that on average, treated and control units within a pair differ on alter by .165 standard deviations. This is quite a low value and indicates the units were closely matched. Had they been exactly matched, this value would be 0. Placing a caliper on a variable is one way to make this value smaller, possibly by discarding units.

When comparing two matching specification with similar levels of balance, the specification with lower pair distances should be preferred; this is the main thesis of King and Nielsen (2019).