Propensity Scores – Handling Unbalanced Co-Variables in IPWT with Propensity Score Matching

propensity-scorestreatment-effect

When using propensity score (PS) for calculating inverse probability weighting (IPW) in an average treatment effect (ATE) approach, is it valid to remove from the PS those co-variables that remain unbalanced, and then, when performing the final analyses using them as co-variables together with the treatment variable?

EDIT: Since the original text may be confusing, I would try to clarify it.

I have used ps() function from twang R package, which implements GBM models. Once executed, there are some co-variables that have a higher absolute standardized effect size in the balanced data than in the unweighted original.

My question was, Does it make sense to remove the co-variables that will be unbalanced after calculating ps(), and then, when performing a logistic regression with the weighted data add them as co-variables?

From @Noah answer: Is statistical valid (and also makes sense), to use all co-variables for calculating the weights, and then used unbalanced co-variables in the final logistic regression analyses?

Best Answer

The goal of IPTW is to achieve balance. If balance is not achieved by your IPTW specification, can you try to respecify the model or you can use regression in the weighted sample with the imbalanced covariates included to adjust for confounding by those covariates. This is not necessarily the best way to proceed, though. Failing to balance a covariate with the weights means that you are placing the entire burden of adjusting for the covariate onto the outcome regression model. If that model is wrong (and it almost certainly is), confounding will remain. The point of balancing is to make it so that the confounding that remains after covariate adjustment by an incorrect model is as minimal as possible. This is the thesis of Ho, Imai, King, and Stuart (2007).

It doesn't make much sense to remove a covariate from a propensity score model. If that model fails to balance a covariate, you should want to add that covariate into the model in multiple different ways (e.g., squared terms, log terms, interactions, subclasses) to achieve balance, not drop it from the model because the model with it in is doing poorly. Surely a model without the covariate will balance the covariate even worse.

Ideally, you should combine IPTW with an outcome regression model so that the remaining imbalance is accounted for by the outcome regression model and the misspecification of the outcome regression model is mitigated by the balance. There several estimators that combine a propensity score and outcome model; these are called "doubly robust" estimators, and outcome regression in an IPTW-weighted sample is one of them, but there are others.

You should also consider using either optimization-based approaches like entropy balancing, which guarantee balance on the covariate means and have good efficiency properties, or machine learning methods like generalized boosted modeling (GBM) or Bayesian additive regression trees (BART), which attempt to flexibly model the propensity score. These are available in the R package WeightIt (which I developed). There has been so much work done on new, robust methods with excellent statistical properties that one should not be using the simple methods developed 20 years ago.