All matching estimators for the treatment on the treated effect can be written in the form
$$ \frac{1}{n_T} \sum_{i \in \{d_i=1\}} \left[ y_{1i} - \sum_{j \in \{d_j = 0 \}} w_{ij} \cdot y_{0j} \right] ,$$
where $w_{ij}$ is the weight placed on the $j$th untreated observation as a counterfactual for the $i$th treated observation, and $n_T$ is the number of treated persons. The weights satisfy $\sum_j w_{ij}=1$ for all $i$.
Effectively, from each treated observation $i$, you subtract a weighted average of the control observations. Then you take the average of these differences. These weights are specific to observation $i$. Different matching estimators differ in how they construct the weights.
For example, nearest neighbor matching sets the weight to 1 for the single untreated observations closest to $i$ in terms of the propensity score and to 0 for all others. k-NN uses $k$ closest neighbors instead.
Interval matching consists of dividing the range of propensity scores into a fixed number of intervals (which need not be of equal length). An interval-specific estimate is obtained by taking the difference between the mean outcomes of the treated and untreated units in each interval.
Radius/caliper matching takes the mean of the outcomes for untreated units within a fixed radius of each treated unit as the estimated expected counterfactual. You pick the radius.
Kernel matching uses weights that decline with the PS distance. You can think about kernel matching as running a weighted regression for each treated observation using the comparison group data and the regression includes only an intercept term. Here you have to pick the kernel and the bandwidth. Larger bandwidth means further observations will have larger weights.
Local linear matching is very similar, but also included a linear term in PS. Some people will also include higher order polynomial terms.
Finally, you have inverse probability weighting. The basic idea is that you can figure out the expected untreated outcome (in either the treated population or the full population) by reweighting the observed values using the treatment probabilities.
There are some guidelines about how to pick a method here.
There is a list of software and packages that can do matching here. Stata also now has native PSM estimators. In my experience, replicating the output by hand is often very hard once you go past the simplest estimators. However, you can also find examples with output for all of these online, so even if you don't have the software, they will give you a useful benchmark since you can usually track down the data.
The documentation for Matching
is sadly fairly incomplete, leaving what it does quite mysterious. What is clear is that it takes a different approach from Stuart (2010) (and the Ho, Imai, King, and Stuart camp) in estimating treatment effects and their standard errors. Rather, it takes heavy inspiration from Abadie & Imbens (2006, 2011), who describe variance estimators and bias-correction for matching estimators. While Stuart and colleagues consider matching a nonparametric pre-processing method that doesn't change the variance of the effect estimates, Abadie, Imbens, and Sekhon are careful to consider the variability in the effect estimate induced by the matching. Thus, the analysis that Matching
performs is not described in Stuart (2010).
The philosophy of matching described by Ho, Imai, King, & Stuart (2007) (the authors of the MatchIt
package) is that the analysis that would have been performed without matching should be that performed after matching, and the benefit of matching is robustness to misspecification of the functional form of the model used. The most basic model is none at all, i.e., the difference in treatment group means, but regression models on the treatment and covariates work too. This group argues that no adjustment to the standard error is required, so the standard error you get from the standard analysis on the matched sample is sufficient. This is why you can simply export the matched sample from the output of MatchIt
and run a regression on it, forgetting that the matched sample came from a matching procedure. Austin has additionally argued that standard errors should account for the paired nature of the data, though the MatchIt
camp argue that matching doesn't imply pairing and an unpaired standard error is sufficient. Using cluster-robust standard errors with pair membership as the cluster should accomplish this. This can be done using the sandwich
package after estimating the effect using glm()
or by using the jtools
package.
The philosophy of matching used by Matching
considers the act of matching to be part of the analysis, and the variability it induces in the effect estimate must be taken account of. Much of the theory used in Matching
comes from a series of papers written by Abadie and Imbens, who discuss the bias and variance of matching estimators. Although the documentation for Matching
is not very descriptive, the Stata function teffects nnmatch
is almost identical and uses all the same theory, and its documentation is very descriptive. The effect estimator is that described by Abadie & Imbens (2006); it's not a simple difference in means estimator because of the possibility of ties, k:1 matching, and matching with replacement. Its standard error is described in the paper. There is an option to perform bias correction, which uses a technique described by Abadie & Imbens (2011). This is not the same as performing regression on the matched set. Rather than using matching to provide robustness to a regression estimator, the bias-corrected matching estimator provides robustness to a matching estimator by using parametric bias-correction using the covariates.
The only difference between genetic matching and standard "nearest neighbor" matching is the distance metric used to decide whether two units are near to each other. In teffects nnmatch
in Stata and Match()
in Matching
, the default is the Mahalanobis distance. The innovation of genetic matching is that the distance matrix is continuously reweighted until good balance is found instead of just using the default distance matrix, so the theory for the matching estimators still applies.
I think a clear way to write your methods section might be something like
Matching was performed using a genetic matching algorithm (Diamond &
Sekhon, 2013) as implemented in the Matching package (Sekhon, 2011).
Treatment effects were estimated using the Match function in
Matching, which implements the matching estimators and standard error estimators described by Abadie and Imbens (2006). To improve
robustness, we performed bias correction on all continuous covariates
as described by Abadie and Imbens (2011) and implemented using the
BiasAdjust option in the Match function.
This makes your analysis reproducible and curious readers can investigate the literature for themselves (although Matching
is almost an industry standard and already well trusted).
Abadie, A., & Imbens, G. W. (2006). Large Sample Properties of Matching Estimators for Average Treatment Effects. Econometrica, 74(1), 235–267. https://doi.org/10.1111/j.1468-0262.2006.00655.x
Abadie, A., & Imbens, G. W. (2011). Bias-Corrected Matching Estimators for Average Treatment Effects. Journal of Business & Economic Statistics, 29(1), 1–11. https://doi.org/10.1198/jbes.2009.07333
Diamond, A., & Sekhon, J. S. (2013). Genetic matching for estimating causal effects: A general multivariate matching method for achieving balance in observational studies. Review of Economics and Statistics, 95(3), 932–945.
Ho, D. E., Imai, K., King, G., & Stuart, E. A. (2007). Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference. Political Analysis, 15(3), 199–236. https://doi.org/10.1093/pan/mpl013
Stuart, E. A. (2010). Matching Methods for Causal Inference: A Review and a Look Forward. Statistical Science, 25(1), 1–21. https://doi.org/10.1214/09-STS313
Best Answer
This package looks very promising
http://cran.r-project.org/web/packages/nonrandom/vignettes/nonrandom.pdf
but I´ve sofar mostly used the MatchIt package.