R – Matching with Multiple Treatments

causalitymatchingpropensity-scoresr

What's the best way to use matching methods with multiple treatment groups?
I'm assessing the impact of an intervention on an outcome. For my first analysis, I used the MatchIt package (see code below) to match treatment and control groups and to compare outcomes for the treated vs control. The Treat variable is 0 or 1.

nn_match <- matchit(treat ~ cov1 + cov2+ cov3 ,
                     method = "nearest", data = df)

In my second analysis, I'd like to assess the effect of different treatment levels (for example, I have Treatment A, Treatment B, and Treatment C) on the outcome, but I'm stuck on the best way to proceed. Does it make sense to run separate matching analysis for each treatment group while excluding the observations from the other treatments groups?

nn_treatA_match <- matchit(treat_A ~ cov1 + cov2+ cov3 ,
                     method = "nearest", data = df_minus_other_treatBC)
nn_treatB_match <- matchit(treat_B ~ cov1 + cov2+ cov3 ,
                     method = "nearest", data = df_minus_other_treatAC)

Or is there a way to run the analysis with the entire dataset and match controls to each treatment at the same time? Other posts have mentioned the twang package, but I'm not sure if that's what I should go with as I'd like to compare outcomes for each treated group to the control and not just across treated groups.

Best Answer

I recommend taking a look at Lopez & Gutman (2017), who clearly describe the issues at hand and the methods used to solve them.

Based on your description, it sounds like you want the average treatment effect in the control group (ATC) for several treatments. For each treatment level, this answers the question, "For those who received the control, what would their improvement have been had they received treatment A?" We can, in a straightforward way, ask this about all of our treatment groups.

Note this differs from the usual estimand in matching, which is the average treatment effect in the treated (ATT), which answers the question "For those who received treatment, what would their decline had been had they received the control?" This question establishes that for those who received treatment, treatment was effective. The question the ATC answers is about what would happen if we were to give the treatment to those who normally wouldn't take it.

A third question you could ask is "For everyone, what would be the effect of treatment A vs. control?" This as an average treatment effect in the population (ATE) question, and is usually the question we want to answer in a randomized trial. It's very important to know which question you want to answer because each requires a different method. I'll carry on assuming you want the ATC for each treatment.

To get the ATC using matching, you can just perform standard matching between the control and each treatment group. This requires that you keep the control group intact (i.e., no adjustment for common support or caliper). One treatment group at a time, you find the treated individuals that are similar to the control group. After doing this for each treatment group, you can use regression in the aggregate matched sample to estimate the effects of each treatment vs. control on the outcome. To make this straightforward, simply make the control group the reference category of the treatment factor in the regression.

Here's how you might do this in MatchIt:

library(MatchIt)
treatments <- levels(data$treat) #Levels of treatment variable
control <- "control" #Name of control level
data$match.weights <- 1 #Initialize matching weights

for (i in treatments[treatments != control]) {
  d <- data[data$treat %in% c(i, control),] #Subset just the control and 1 treatment
  d$treat_i <- as.numeric(d$treat != i) #Create new binary treatment variable
  m <- matchit(treat_i ~ cov1 + cov2 + cov3, data = d)
  data[names(m$weights), "match.weights"] <- m$weights[names(m$weights)] #Assign matching weights
}

#Check balance using cobalt
library(cobalt)
bal.tab(treat ~ cov1 + cov2 + cov3, data = data, 
        weights = "match.weights", method = "matching", 
        focal = control, which.treat = .all)

#Estimate treatment effects
summary(glm(outcome ~ relevel(treat, control), 
            data = data[data$match.weights > 0,], 
            weights = match.weights))

It's a lot easier to do this using weighting instead of matching. The same assumptions and interpretations of the estimands apply. Using WeightIt, you can simply run

library(WeightIt)
w.out <- weightit(treat ~ cov1 + cov2 + cov3, data = data, focal = "control", estimand = "ATT")

#Check balance
bal.tab(w.out, which.treat = .all)

#Estimate treatment effects (using jtools to get robust SEs)
#(Can also use survey package)
library(jtools)
summ(glm(outcome ~ relevel(treat, "control"), data = data,
         weights = w.out$weights), robust = "HC1")

To get the ATE, you need to use weighting. In the code above, simple replace estimand = "ATT" with estimand = "ATE" and remove focal = "control". Take a look at the WeightIt documentation for more options. In particular, you can set method = "gbm", which will give you the same results as using twang. Note that I'm the author of both cobalt and WeightIt.


Lopez, M. J., & Gutman, R. (2017). Estimation of Causal Effects with Multiple Treatments: A Review and New Ideas. Statistical Science, 32(3), 432–454. https://doi.org/10.1214/17-STS612