Clinical Trials – Comparing Two or More Treatments with Inverse Probability of Treatment Weighting

clinical-trialspropensity-scoresrstata

I am working on a cardiovascular observational (i.e. non-randomized) study featuring three or more competing treatments.

My preference would be to conduct the analysis first using 1:1 propensity score matching, for instance using twang or MatchIt in R, or psmatch2 in Stata. Then, confirm the main analysis without excluding any case by means of inverse probability of treatment weighting, for instance using twang in R, or meglm in Stata.
However, I have at least three groups under comparison, and so the standard appraoch needs rethinking.

I have found in the web the TriMatch package for R, which appears to be capable to handle more than one treatment, but it seems to focus only on propensity scores.

In any case, what is the most sensible approach? Should I dichotomize all comparisons (eg destructuring an A vs B vs C comparison into A vs B, A vs C, and B vs C) and proceed with standard methods? Or should I pursue an alternative, and possibly more challenging, approach?

Best Answer

You can use twang with more than 2 treatment levels -- I use it all the time to obtain propensity scores for multiple (i.e. >2) treatments and it's one of my all time favorite R packages because there is no need to guess the functional relationships between your treatments and covariates. Since twang uses gradient boosted regression, it can fit nonlinear relationships and interactions automatically. All you need to do is assess the balance statistics after obtaining the propensity scores using the get.weights function. Using the mnps function in twang instead of the ps function, you can obtain propensity score weights for multiple treatments. The good folks at RAND have even prepared a nice tutorial for multiple treatment propensity score weighting.

One word of caution is in order however with twang. It isn't the most efficiently written program and it cannot be easily parrallelized, so if you have a lot of data (e.g. > 100,000 observations), it can be quite slow. Also, sometimes weights can be assigned which are too large and therefore influence your results too much. I perform a simple check on the weights to make sure that no weight is overly large (e.g. no weight accounts for more than say 5% of the total of all weights). Note too that the twang propensity scores weights are not standardized so they do not add to one. They can be through some manipulation, but I rarely find that I need to do this.

Lastly, work by Elizabeth Stuart has shown propensity scores built with gradient boosting methods outperforms other methods. Therefore, I'd strongly advocate the use of twang.

Related Solutions

Solved – Inverse probability weighting (IPW): standard errors after weighting observations

Lunceford and Davidian (2004) derive the asymptotic standard errors for IPW estimators. These rely on a generalized estimating equations approach and assume the propensity scores are estimated using a method that can be represented as a system of estimating equations (e.g., logistic regression, but not random forests). Their proof also indicates that IPW estimators are smooth and asymptotically normal, making them amenable to bootstrapping. They also find that excluding the propensity score estimation from the estimating equations and treating the weights as fixed yields conservative estimates of the standard error. This is equivalent to using a robust standard error in the outcome regression model.

This leads to three ways to validly estimate the standard error in IPW:

Using generalized estimating equations with the propensity scores and outcome model included together. This can be manually programmed using geex in R, and some R packages like PSweightcan also compute them. In SAS, PROC CAUSALTRT automatically computes the correct standard errors, and in Stata, teffects ipw uses the same approach.
Using a robust standard error for the outcome model. This will generally be conservative and is the simplest and most flexible approach because it can be used with weight-estimation methods that are not implemented in those packages or can't be represented as systems of estimating equations, like generalized boosted modeling, which is a somewhat popular method. To do this in R, you would use survey::vcovHC() after a glm() or lm() call with the outcome model, survey::svyglm(), which is recommended in the twang and WeightIt documentation, or geepack::geeglm() as recommended by Hernán and Robins (2020). In SAS, you would use PROC SURVEYREG, and in Stata you would use supply the weights to the aweights argument in any regression model, which automatically requests robust standard errors.
Using the bootstrap. The bootstrap, where you include the propensity score estimation and effect estimation within each replication, is a very effective method because it does not rely on asymptotic arguments, can be used with weight-estimation methods that can't be or aren't implemented in the packages named above, and can be used for any estimand, regardless of whether analytical standard errors have been derived for them (e.g., for the rate ratio in a negative binomial outcome model). The difficulty is that one needs to know how to program a bootstrap and one needs to be prepared to wait a potentially long time when the estimation procedure takes a while, e.g., for some machine learning methods. Also, the bootstrap will tend to yield different estimates each time, adding an additional layer of uncertainty into the estimation. Using the bootstrap is easiest in R with the boot package.

I am most inclined to use robust standard errors because of their flexibility and ease of use. For a serious project where conservative standard errors could be a liability and I had a lot of time, I would use the bootstrap.

Solved – Calculating weights for inverse probability weighting for the treatment effect on the untreated/non-treated

For ATU, the weights on $y_i$ would be $$ w_i = \begin{cases} \frac{1 - \hat p(x_i)}{\hat p(x_i)} & \text{if}\ d_i=1 \\ 1 & \text{if}\ d_i=0, \end{cases} $$ where $d_i$ is the binary treatment indicator.

For ATT/ATET, the weights are $$ w_i = \begin{cases} 1 & \text{if}\ d_i=1 \\ \frac{\hat p(x_i)}{1-\hat p(x_i)} & \text{if}\ d_i=0 \end{cases} $$

For ATE, the weights are $$ w_i = \begin{cases} \frac{1}{\hat p(x_i)} & \text{if}\ d_i=1 \\ \frac{1}{1-\hat p(x_i)} & \text{if}\ d_i=0 \end{cases} $$

You can find these formulas derived on pages 67-69 of Micro-Econometrics for Policy, Program and Treatment Effects by Myoung-jae Lee, except that I broke them into two pieces here.

Here's how I might do this in Stata, with native commands when possible and also by hand with a weighted regression of the outcome on a binary treatment dummy:

cls
set more off
webuse cattaneo2, clear
/* (0) Get the phats */
qui probit mbsmoke mmarried c.mage##c.mage fbaby medu
predict double phat, pr
/* (1a) ATE */
teffects ipw (bweight) (mbsmoke mmarried c.mage##c.mage fbaby medu, probit), ate
/* (1b) ATE By Hand */
gen double ate_w =cond(mbsmoke==1,1/phat,1/(1-phat))
reg bweight i.mbsmoke [pw=ate_w], vce(robust)
/* (2a) ATT */
teffects ipw (bweight) (mbsmoke mmarried c.mage##c.mage fbaby medu, probit), atet
/* (2b) ATT by Hand */
gen double att_w =cond(mbsmoke==1,1,phat/(1-phat))
reg bweight i.mbsmoke [pw=att_w], vce(robust)
/* (3) ATU by Hand Only */
gen double atu_w =cond(mbsmoke==1,(1-phat)/phat,1)
reg bweight i.mbsmoke [pw=atu_w], vce(robust)

This gives the following three effects of maternal smoking on newborn weight:

ATU = -231.8782 grams
ATT = -225.1773 grams
ATE = -230.6886 grams

Best Answer

Related Solutions

Solved – Inverse probability weighting (IPW): standard errors after weighting observations

Solved – Calculating weights for inverse probability weighting for the treatment effect on the untreated/non-treated

Related Question