The documentation for Matching
is sadly fairly incomplete, leaving what it does quite mysterious. What is clear is that it takes a different approach from Stuart (2010) (and the Ho, Imai, King, and Stuart camp) in estimating treatment effects and their standard errors. Rather, it takes heavy inspiration from Abadie & Imbens (2006, 2011), who describe variance estimators and bias-correction for matching estimators. While Stuart and colleagues consider matching a nonparametric pre-processing method that doesn't change the variance of the effect estimates, Abadie, Imbens, and Sekhon are careful to consider the variability in the effect estimate induced by the matching. Thus, the analysis that Matching
performs is not described in Stuart (2010).
The philosophy of matching described by Ho, Imai, King, & Stuart (2007) (the authors of the MatchIt
package) is that the analysis that would have been performed without matching should be that performed after matching, and the benefit of matching is robustness to misspecification of the functional form of the model used. The most basic model is none at all, i.e., the difference in treatment group means, but regression models on the treatment and covariates work too. This group argues that no adjustment to the standard error is required, so the standard error you get from the standard analysis on the matched sample is sufficient. This is why you can simply export the matched sample from the output of MatchIt
and run a regression on it, forgetting that the matched sample came from a matching procedure. Austin has additionally argued that standard errors should account for the paired nature of the data, though the MatchIt
camp argue that matching doesn't imply pairing and an unpaired standard error is sufficient. Using cluster-robust standard errors with pair membership as the cluster should accomplish this. This can be done using the sandwich
package after estimating the effect using glm()
or by using the jtools
package.
The philosophy of matching used by Matching
considers the act of matching to be part of the analysis, and the variability it induces in the effect estimate must be taken account of. Much of the theory used in Matching
comes from a series of papers written by Abadie and Imbens, who discuss the bias and variance of matching estimators. Although the documentation for Matching
is not very descriptive, the Stata function teffects nnmatch
is almost identical and uses all the same theory, and its documentation is very descriptive. The effect estimator is that described by Abadie & Imbens (2006); it's not a simple difference in means estimator because of the possibility of ties, k:1 matching, and matching with replacement. Its standard error is described in the paper. There is an option to perform bias correction, which uses a technique described by Abadie & Imbens (2011). This is not the same as performing regression on the matched set. Rather than using matching to provide robustness to a regression estimator, the bias-corrected matching estimator provides robustness to a matching estimator by using parametric bias-correction using the covariates.
The only difference between genetic matching and standard "nearest neighbor" matching is the distance metric used to decide whether two units are near to each other. In teffects nnmatch
in Stata and Match()
in Matching
, the default is the Mahalanobis distance. The innovation of genetic matching is that the distance matrix is continuously reweighted until good balance is found instead of just using the default distance matrix, so the theory for the matching estimators still applies.
I think a clear way to write your methods section might be something like
Matching was performed using a genetic matching algorithm (Diamond &
Sekhon, 2013) as implemented in the Matching package (Sekhon, 2011).
Treatment effects were estimated using the Match function in
Matching, which implements the matching estimators and standard error estimators described by Abadie and Imbens (2006). To improve
robustness, we performed bias correction on all continuous covariates
as described by Abadie and Imbens (2011) and implemented using the
BiasAdjust option in the Match function.
This makes your analysis reproducible and curious readers can investigate the literature for themselves (although Matching
is almost an industry standard and already well trusted).
Abadie, A., & Imbens, G. W. (2006). Large Sample Properties of Matching Estimators for Average Treatment Effects. Econometrica, 74(1), 235–267. https://doi.org/10.1111/j.1468-0262.2006.00655.x
Abadie, A., & Imbens, G. W. (2011). Bias-Corrected Matching Estimators for Average Treatment Effects. Journal of Business & Economic Statistics, 29(1), 1–11. https://doi.org/10.1198/jbes.2009.07333
Diamond, A., & Sekhon, J. S. (2013). Genetic matching for estimating causal effects: A general multivariate matching method for achieving balance in observational studies. Review of Economics and Statistics, 95(3), 932–945.
Ho, D. E., Imai, K., King, G., & Stuart, E. A. (2007). Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference. Political Analysis, 15(3), 199–236. https://doi.org/10.1093/pan/mpl013
Stuart, E. A. (2010). Matching Methods for Causal Inference: A Review and a Look Forward. Statistical Science, 25(1), 1–21. https://doi.org/10.1214/09-STS313
This is explained in Stuart (2008) and in the cobalt
vignette. The problem is that when comparing balance before and after matching, the SMD will be affected not only by changes in balance but also by changes in the standard deviation of the covariate when the standard deviation of the matched sample is used as the standardization factor after matching. This muddles two things together when we only care about one. Holding the standard deviation constant prevents this, isolating the effect of matching on balance alone.
Consider the following example. Let's say the mean of a covariate X
(e.g., age) in the treated group is 44 and the mean in the control group is 46, and the pooled standard deviation is 9. Let's say that after matching, the control group mean is now 45 and the pooled standard deviation is now 4. Was there better balance before matching or after matching?
It should be clear that the covariate means are closer together, which indicates an improvement in balance and therefore a reduction in bias. Which method of computing the SMD reflects this?
Prior to matching, the SMD is (46-44)/9 = .22
. By the standard criterion of SMDs less than .1, this would be considered imbalanced.
Using the formula for the SMD that uses the standard deviation in the unmatched sample, the matched SMD is (45-44)/9 = .11
, indicating better balance.
Using the formula for the SMD that uses the standard deviation in the matched sample, the matched SMD is (45-44)/4 = .25
, indicating that balance got worse after matching!
Remember that the bias of the effect estimate is a function of the mean differences, and standardizing them to produce the SMD is just a way to simplify balance statistics for users. It's all arbitrary anyway, but at least using the unmatched standard deviation correctly isolates changes in balance from changes in variability, the latter of which is not related to bias.
Best Answer
I would say one of the main reasons is that for estimands that rely on mean potential outcomes (e.g., the difference in means, risk ratio, odds ratio), the specific arrangement of the pairs has no relation to bias. That is, if you have a matched sample and then randomly pair the treated and control units, the effect estimate after randomly matching will be identical to the effect estimate after the original match. The philosophy of matching as nonparametric preprocessing argues that the purpose of pairing in matching is as subset selection, i.e., selecting a subset of the original sample in which balance is achieved and bias (i.e., model misspecification) is reduced. Pairing is a way to do this, but it is not the only way, and it itself does not affect bias. There are a number of matching methods that do not involve pairing but are highly effective at achieving balance.
That said, pairwise balance is relevant to the overall bias in a certain way. Rather than thinking about pairwise balance as a property of a given pairing, instead, it is useful to think of the best pairwise balance that could be achieved by a possible pairing in a given matched sample. For example, imagine first that 1:1 matching was done with exact matching for age, so each pair contains units that have equal ages but may be different on other variables. After matching, the distributions of age will be identical between the two treatment groups. Let's say that you now randomly pair those in the matched sample, breaking the original pairs so that age is no longer exactly matched. This does not change the distributions of ages in the matched sample; they will still be identical. Similarly, if it were possible to exactly match on education without discarding any units from this matched sample, that would indicate that the distributions of education were identical, even if the units were not actually matched on education. Again, the pairwise balance of a given pairing is less important than the best possible pairwise balance a matched sample could have under a hypothetical pairing. The closer the best possible pairing is to exact matching, the better the distributional balance of the covariate, and the better overall balance has been attained, regardless of the pairing actually used to create the matched sample or estimate the treatment effect.
The idea of assessing pairwise balance has been discussed by some methodologists in the matching literature. For example, Rubin (1973) recommends the use of two balance statistics to evaluate the quality of a match: $$ \bar d^1=\bar x_1 - \bar x_0 $$ and $$ \bar d^2=\frac{1}{N}\sum (x_{1i} - x_{0i})^2 $$ where the former is the difference in means and the latter is the average squared pairwise differences. Similarly, the measure used as the criterion in optimal matching is $\sum d_i$ where $d_i$ is the distance between the two units in pair, equal to $|x_{1i} - x_{0i}|$ when the distance variable $x$ is univariate (e.g., when propensity score matching). Though not strictly a balance statistic, the failure to achieve small pairwise differences in the distance measure indicates a failure of the matching to achieve balance. The
MatchIt
package in R produces this statistic for each covariate when any pair matching method is used.A more complete way to assess balance would be to perform optimal matching within a matched sample using a different variable or set of variables to compute the distance measure and see how good the best balance one can achieve is rather than rely on the pairwise differences of the specific matched specification used to subset the data. For example, after doing matching to subset the data, you can then run optimal matching in the matched dataset without discarding any additional units, using a different variable as the matching variable. If the average pair distance on the matching variable is 0, then the sample is exactly balanced on that variable, even if the original pairing did not yield such closely matched pairs. Similarly, if you take two variables and use them to compute a distance measure (e.g., the Mahalanobis distance), then pair match on that measure in the matched sample, an average pairwise distance of 0 indicates that the groups are exactly matched on both variables and their interaction (i.e., on the joint distribution of those covariates), which is an even stronger form of balance, even if in the original sample they were not so closely paired. This is a bit of a laborious process, especially for many combinations of covariates, but it would give a far more complete picture of balance beyond mean differences and even beyond univariate distribution statistics like the Kolmogorov-Smirnov statistic.
There are issues beyond bias worth considering. Having close pairs decreases the standard error estimate when accounting for pair membership in estimation of the treatment effect. It is also possible for close pairs to reduce sensitivity to unobserved confounding, but only when using somewhat arcane methods to estimate the treatment effect as described in Zubizarreta et al. (2014). For these cases, it makes sense to achieve as low pairwise distances as possible on covariates highly predictive of the outcome.
Rubin, D. B. (1973). Matching to Remove Bias in Observational Studies. Biometrics, 29(1), 159–183. https://doi.org/10.2307/2529684
Zubizarreta, J. R., Paredes, R. D., & Rosenbaum, P. R. (2014). Matching for balance, pairing for heterogeneity in an observational study of the effectiveness of for-profit and not-for-profit high schools in Chile. The Annals of Applied Statistics, 8(1), 204–231. https://doi.org/10.1214/13-AOAS713