Solved – One-to-many matching of propensity scores and average treatment effect on the treated

propensity-scorestreatment-effect

I've been working on a propensity score matching project, but having no stats classes under my belt I'm struggling to calculate the so called "average treatment effect on the treated" (ATT) so that it will be statistically valid. It seems straight-forward enough when looking at the following formula:

$ATT = E[y1 – y0 | p(x)] = E[y | t = 1,p(x)] – E[y | t = 0, p(x)]$

That is, the average treatment effect on the treated is equal to the expected value ($E$ = average in this case) of the difference between the outcome variable (for example, $y$ = income) for treated and untreated individuals (where, for example, treated individuals ($t=1$) received some sort of training and untreated individuals ($t=0$) did not), conditional on their propensity scores, $p(x)$ (i.e. when their propensity scores match "well enough"). With one-to-one matching of treated and control individuals using nearest neighbour or exact matching, calculating ATT manually in a spreadsheet is fine, but when matching more than one control to each treated individual things seem to get a little tricky. As such, I'm hoping someone may be able to help me figure out what the generally accepted methods for calculating ATT are when dealing with one-to-many matching between treated and control individuals.

Best Answer

All matching estimators for the treatment on the treated effect can be written in the form

$$ \frac{1}{n_T} \sum_{i \in \{d_i=1\}} \left[ y_{1i} - \sum_{j \in \{d_j = 0 \}} w_{ij} \cdot y_{0j} \right] ,$$

where $w_{ij}$ is the weight placed on the $j$th untreated observation as a counterfactual for the $i$th treated observation, and $n_T$ is the number of treated persons. The weights satisfy $\sum_j w_{ij}=1$ for all $i$.

Effectively, from each treated observation $i$, you subtract a weighted average of the control observations. Then you take the average of these differences. These weights are specific to observation $i$. Different matching estimators differ in how they construct the weights.

For example, nearest neighbor matching sets the weight to 1 for the single untreated observations closest to $i$ in terms of the propensity score and to 0 for all others. k-NN uses $k$ closest neighbors instead.

Interval matching consists of dividing the range of propensity scores into a fixed number of intervals (which need not be of equal length). An interval-specific estimate is obtained by taking the difference between the mean outcomes of the treated and untreated units in each interval.

Radius/caliper matching takes the mean of the outcomes for untreated units within a fixed radius of each treated unit as the estimated expected counterfactual. You pick the radius.

Kernel matching uses weights that decline with the PS distance. You can think about kernel matching as running a weighted regression for each treated observation using the comparison group data and the regression includes only an intercept term. Here you have to pick the kernel and the bandwidth. Larger bandwidth means further observations will have larger weights.

Local linear matching is very similar, but also included a linear term in PS. Some people will also include higher order polynomial terms.

Finally, you have inverse probability weighting. The basic idea is that you can figure out the expected untreated outcome (in either the treated population or the full population) by reweighting the observed values using the treatment probabilities.

There are some guidelines about how to pick a method here.

There is a list of software and packages that can do matching here. Stata also now has native PSM estimators. In my experience, replicating the output by hand is often very hard once you go past the simplest estimators. However, you can also find examples with output for all of these online, so even if you don't have the software, they will give you a useful benchmark since you can usually track down the data.

Related Question