I would like to know how to interpret the Std. Pair Dist.
(standardized pair distances) of summary()
output of R's MatchIt
package.
The maintainer @Noha said here that "The average difference between pairs is in the Std. pair dist. column.". Of course I assume he is correct. But I misinterpret it.
See the last column in that following example output was generated with method="nearest", distance="glm"
:
Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean eCDF Max Std. Pair Dist.
distance 0.011647 0.011643 0.000624 1.009227 0.000007 0.004008 0.000678
geschlechtm 0.304609 0.312625 -0.017417 NaN 0.008016 0.008016 0.113211
geschlechtw 0.695391 0.687375 0.017417 NaN 0.008016 0.008016 0.113211
alter 69.082164 69.058116 0.002490 0.952987 0.005572 0.014028 0.164750
pflege 0.280561 0.262525 0.024109 1.077771 0.003674 0.010020 0.120543
OK, let's see. Isn't the average just the mean()
?
But calculating the mean()
on the matched data gives me results different from the summary table (btw: It is Python code because I use MatchIt
via Pythons rpy2
package):
>>> df.Vm001.distance.mean()
0.011644758037430517
>>> df.Vm001.alter.mean()
69.07014028056112
Question's
- How is this value computed?
- What does this value mean? Or how should it be interpreted?
Best Answer
This value is computed as the average of the distance between units within a pair on the given covariate. It's easiest to think about this with 1:1 matching. Consider a matched pair, and take the difference between the treated unit in that pair's value of the covariate and the control unit in that pair's value of the covariate. Then take the absolute value of this difference to make it a distance. Then take the average of these distances across all pairs. When
standardize = TRUE
(the default), the average is then standardized in the same way the standardized mean differences are.It looks like you were just taking the mean of the covariate, which is not right. Taking the mean of the distance measure (i.e., the propensity score) is not right either. You have to compute the differences within pairs between the treated and control members of each pair.
So, for example, the value of
0.164750
foralter
means that on average, treated and control units within a pair differ onalter
by .165 standard deviations. This is quite a low value and indicates the units were closely matched. Had they been exactly matched, this value would be 0. Placing a caliper on a variable is one way to make this value smaller, possibly by discarding units.When comparing two matching specification with similar levels of balance, the specification with lower pair distances should be preferred; this is the main thesis of King and Nielsen (2019).