Interpret the “standardized pair distances” (Std. Pair Dist.) in MatchIt output

matchingpropensity-scorespythonr

I would like to know how to interpret the Std. Pair Dist. (standardized pair distances) of summary() output of R's MatchIt package.

The maintainer @Noha said here that "The average difference between pairs is in the Std. pair dist. column.". Of course I assume he is correct. But I misinterpret it.

See the last column in that following example output was generated with method="nearest", distance="glm":

             Means Treated  Means Control  Std. Mean Diff.  Var. Ratio  eCDF Mean  eCDF Max  Std. Pair Dist.
distance             0.011647       0.011643         0.000624    1.009227   0.000007  0.004008         0.000678
geschlechtm          0.304609       0.312625        -0.017417         NaN   0.008016  0.008016         0.113211
geschlechtw          0.695391       0.687375         0.017417         NaN   0.008016  0.008016         0.113211
alter               69.082164      69.058116         0.002490    0.952987   0.005572  0.014028         0.164750
pflege               0.280561       0.262525         0.024109    1.077771   0.003674  0.010020         0.120543

OK, let's see. Isn't the average just the mean()?

But calculating the mean() on the matched data gives me results different from the summary table (btw: It is Python code because I use MatchIt via Pythons rpy2 package):

>>> df.Vm001.distance.mean()
0.011644758037430517

>>> df.Vm001.alter.mean()
69.07014028056112

Question's

How is this value computed?
What does this value mean? Or how should it be interpreted?

Best Answer

This value is computed as the average of the distance between units within a pair on the given covariate. It's easiest to think about this with 1:1 matching. Consider a matched pair, and take the difference between the treated unit in that pair's value of the covariate and the control unit in that pair's value of the covariate. Then take the absolute value of this difference to make it a distance. Then take the average of these distances across all pairs. When standardize = TRUE (the default), the average is then standardized in the same way the standardized mean differences are.

It looks like you were just taking the mean of the covariate, which is not right. Taking the mean of the distance measure (i.e., the propensity score) is not right either. You have to compute the differences within pairs between the treated and control members of each pair.

So, for example, the value of 0.164750 for alter means that on average, treated and control units within a pair differ on alter by .165 standard deviations. This is quite a low value and indicates the units were closely matched. Had they been exactly matched, this value would be 0. Placing a caliper on a variable is one way to make this value smaller, possibly by discarding units.

When comparing two matching specification with similar levels of balance, the specification with lower pair distances should be preferred; this is the main thesis of King and Nielsen (2019).

Related Solutions

Solved – How to specify a contrast matrix (in R) for the difference between one level and an average of the others

That comparison of one with the mean of all later variables is (aside from scale), called Helmert coding or Helmert contrasts. The one you give is the first contrast, the other would be a scaled version of $(0, 1, -1)^\top$.

What R calls helmert coding, this calls 'reverse Helmert'. They're equivalent up to a change of variable order.

Solved – How to interpret output of Match() function in R (for propensity score matching)

So the output is

Estimate... -0.349,
AI SE... 0.124,
T-stat... -2.827,
p.val... 0.005

You did the matching presumably because you'd like to interpret the difference in outcome for treatment and control as a causal effect, i.e. as the change in the dependent variable caused by treatment, and you don't necessarily trust a big regression with controls to work out for you (though you do trust that you've got all the causes of treatment assignment bundled into the propensity score model).

In your case I guess that the dependent variable is a probability. If so then the matching analysis says that that probability is 0.35 less due to treatment - so an absolute 0.35 because you're computing a difference. This difference is computed after your data set is matched, pruned, etc. as well as it can to balance covariates over treatment and control cases. Actually you'd want to check that balance using other functions in the package before just trusting the summary output.

You have a lot of control over what 'good matching' means, though you've gone with the defaults which are, I believe to calculate an average treatment effect (ATE), not use calipers, etc. You can see the defaults on the relevant help page. So that's the Estimate here.

The AI SE is a matching corrected standard error due to Abadie and Imbens (hence the name AI). The t-stat and p.value are interpretable as usual, though corrected with that standard error. The details of AI standard errors you can find in A and I's original paper.

Question's

Best Answer

Related Solutions

Solved – How to specify a contrast matrix (in R) for the difference between one level and an average of the others

Solved – How to interpret output of Match() function in R (for propensity score matching)

Related Question