Clinical Trials – Comparing Two or More Treatments with Inverse Probability of Treatment Weighting

clinical-trialspropensity-scoresrstata

I am working on a cardiovascular observational (i.e. non-randomized) study featuring three or more competing treatments.

My preference would be to conduct the analysis first using 1:1 propensity score matching, for instance using twang or MatchIt in R, or psmatch2 in Stata. Then, confirm the main analysis without excluding any case by means of inverse probability of treatment weighting, for instance using twang in R, or meglm in Stata.
However, I have at least three groups under comparison, and so the standard appraoch needs rethinking.

I have found in the web the TriMatch package for R, which appears to be capable to handle more than one treatment, but it seems to focus only on propensity scores.

In any case, what is the most sensible approach? Should I dichotomize all comparisons (eg destructuring an A vs B vs C comparison into A vs B, A vs C, and B vs C) and proceed with standard methods? Or should I pursue an alternative, and possibly more challenging, approach?

Best Answer

You can use twang with more than 2 treatment levels -- I use it all the time to obtain propensity scores for multiple (i.e. >2) treatments and it's one of my all time favorite R packages because there is no need to guess the functional relationships between your treatments and covariates. Since twang uses gradient boosted regression, it can fit nonlinear relationships and interactions automatically. All you need to do is assess the balance statistics after obtaining the propensity scores using the get.weights function. Using the mnps function in twang instead of the ps function, you can obtain propensity score weights for multiple treatments. The good folks at RAND have even prepared a nice tutorial for multiple treatment propensity score weighting.

One word of caution is in order however with twang. It isn't the most efficiently written program and it cannot be easily parrallelized, so if you have a lot of data (e.g. > 100,000 observations), it can be quite slow. Also, sometimes weights can be assigned which are too large and therefore influence your results too much. I perform a simple check on the weights to make sure that no weight is overly large (e.g. no weight accounts for more than say 5% of the total of all weights). Note too that the twang propensity scores weights are not standardized so they do not add to one. They can be through some manipulation, but I rarely find that I need to do this.

Lastly, work by Elizabeth Stuart has shown propensity scores built with gradient boosting methods outperforms other methods. Therefore, I'd strongly advocate the use of twang.