Treatment Effect – How to Evaluate Treatment Effect after Matching in Observational Studies

matchingobservational-studyregressiontreatment-effect

In Elizabeth's Stuart's 2010 paper "Matching methods for causal inference: A review and a look forward", she states the following:

"Section 5: Analysis of the Outcome:
… After the matching has
created treated and control groups with adequate balance (and the
observational study thus “designed”), researchers can move to the
outcome analysis stage. This stage will generally involve regression
adjustments using the matched samples, with the details of the
analysis depending on the structure of the matching."

Section 6.2: Guidance for practice: … 5) Examine the balance on
covariates resulting from that matching method.
If adequate, move forward with treatment effect estimation, using
regression adjustment on the matched samples."

The specifics of how to use regression after matching, however, is not mentioned. I can think of two options:

1) Use simple Regression with:

  • X= Treatment group (1/0)

  • Y= variable/outcome of interest for evaluating treatment effect

2) Use Multiple regression with:

  • X= Treatment group (1/0) + all other matching covariates where balance has been achieved

  • Y= variable/outcome of interest for evaluating treatment effect

In R's Matching Package, the documentation doesn't specify what kind of regression it uses (I am assuming it is using regression).

I read the paper on the Matching package ("Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching Package for R"- Jasjeet S. Sekhon), thoroughly looked at the R documentation, and even spent close to an hour today trying to understand the Matching code on Github, but to no avail and I am still not sure what exactly is being done.

I need to understand the specifics of what test is used to evaluate treatment effect and justify why it's being used for an academic paper that I am working on that uses Genetic Matching. If anyone can guide me to an explanation of exactly what statistical method should be used/is being used by R to estimate Treatment effect, that would be really helpful

Best Answer

The documentation for Matching is sadly fairly incomplete, leaving what it does quite mysterious. What is clear is that it takes a different approach from Stuart (2010) (and the Ho, Imai, King, and Stuart camp) in estimating treatment effects and their standard errors. Rather, it takes heavy inspiration from Abadie & Imbens (2006, 2011), who describe variance estimators and bias-correction for matching estimators. While Stuart and colleagues consider matching a nonparametric pre-processing method that doesn't change the variance of the effect estimates, Abadie, Imbens, and Sekhon are careful to consider the variability in the effect estimate induced by the matching. Thus, the analysis that Matching performs is not described in Stuart (2010).

The philosophy of matching described by Ho, Imai, King, & Stuart (2007) (the authors of the MatchIt package) is that the analysis that would have been performed without matching should be that performed after matching, and the benefit of matching is robustness to misspecification of the functional form of the model used. The most basic model is none at all, i.e., the difference in treatment group means, but regression models on the treatment and covariates work too. This group argues that no adjustment to the standard error is required, so the standard error you get from the standard analysis on the matched sample is sufficient. This is why you can simply export the matched sample from the output of MatchIt and run a regression on it, forgetting that the matched sample came from a matching procedure. Austin has additionally argued that standard errors should account for the paired nature of the data, though the MatchIt camp argue that matching doesn't imply pairing and an unpaired standard error is sufficient. Using cluster-robust standard errors with pair membership as the cluster should accomplish this. This can be done using the sandwich package after estimating the effect using glm() or by using the jtools package.

The philosophy of matching used by Matching considers the act of matching to be part of the analysis, and the variability it induces in the effect estimate must be taken account of. Much of the theory used in Matching comes from a series of papers written by Abadie and Imbens, who discuss the bias and variance of matching estimators. Although the documentation for Matching is not very descriptive, the Stata function teffects nnmatch is almost identical and uses all the same theory, and its documentation is very descriptive. The effect estimator is that described by Abadie & Imbens (2006); it's not a simple difference in means estimator because of the possibility of ties, k:1 matching, and matching with replacement. Its standard error is described in the paper. There is an option to perform bias correction, which uses a technique described by Abadie & Imbens (2011). This is not the same as performing regression on the matched set. Rather than using matching to provide robustness to a regression estimator, the bias-corrected matching estimator provides robustness to a matching estimator by using parametric bias-correction using the covariates.

The only difference between genetic matching and standard "nearest neighbor" matching is the distance metric used to decide whether two units are near to each other. In teffects nnmatch in Stata and Match() in Matching, the default is the Mahalanobis distance. The innovation of genetic matching is that the distance matrix is continuously reweighted until good balance is found instead of just using the default distance matrix, so the theory for the matching estimators still applies.

I think a clear way to write your methods section might be something like

Matching was performed using a genetic matching algorithm (Diamond & Sekhon, 2013) as implemented in the Matching package (Sekhon, 2011). Treatment effects were estimated using the Match function in Matching, which implements the matching estimators and standard error estimators described by Abadie and Imbens (2006). To improve robustness, we performed bias correction on all continuous covariates as described by Abadie and Imbens (2011) and implemented using the BiasAdjust option in the Match function.

This makes your analysis reproducible and curious readers can investigate the literature for themselves (although Matching is almost an industry standard and already well trusted).


Abadie, A., & Imbens, G. W. (2006). Large Sample Properties of Matching Estimators for Average Treatment Effects. Econometrica, 74(1), 235–267. https://doi.org/10.1111/j.1468-0262.2006.00655.x

Abadie, A., & Imbens, G. W. (2011). Bias-Corrected Matching Estimators for Average Treatment Effects. Journal of Business & Economic Statistics, 29(1), 1–11. https://doi.org/10.1198/jbes.2009.07333

Diamond, A., & Sekhon, J. S. (2013). Genetic matching for estimating causal effects: A general multivariate matching method for achieving balance in observational studies. Review of Economics and Statistics, 95(3), 932–945.

Ho, D. E., Imai, K., King, G., & Stuart, E. A. (2007). Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference. Political Analysis, 15(3), 199–236. https://doi.org/10.1093/pan/mpl013

Stuart, E. A. (2010). Matching Methods for Causal Inference: A Review and a Look Forward. Statistical Science, 25(1), 1–21. https://doi.org/10.1214/09-STS313