propensity-scores – Handling Unbalanced Variables After IPTW with Entropy Balancing

propensity-scoresrtreatment-effectweighted-data

After using inverse probability of treatment weighting (IPTW) on the variables of my dataset, there is still an imbalance in one covariate between the two groups. My outcome is binary (yes/no) and it is not a longitudinal study.

One example is:

library(WeightIt)
W.out <- weightit(treat ~ age + married + race,
                  data = lalonde, estimand = "ATE", method = "ps")
bal.tab(W.out, threshold=0.1)

Age is not balanced.

  • How can I make all the variables balanced? Is it possible to "re-weight"? How?
  • Is it possible to apply directly "entropy balancing" instead of IPTW
    in this case? Can somebody explain to me entropy balancing? I tried
    reading the original paper (here) but I didn't understand it so
    much. How is entropy balance computed? Can it be always used at the same conditions as IPTW or are there particular conditions?
  • If entropy balancing is able to adjust with Standardized differences of almost 0, then why is it so little used in the medical field?
  • I noticed that in some papers there is the cohort after 1st weighting, then 2nd weighting, etc.. can someone explain how you obtain this? how many weighting do you have to do?

For instance, if I want to use this code:

W.out <- weightit(treat ~ age + married + race,
                  data = lalonde, estimand = "ATE", method = "ebal")

What are the parameters that I have to set and that I have to pay attention for in order to know that I applied the method correctly? Is there a way to visualize the scores from which the weights were obtained from? as in the case of IPTW (W.out$ps)

Best Answer

These are some good questions. I'll do my best to give simple answers to them.

Entropy balancing (EB) for the ATT (which is not your query) is IPTW. It implicitly estimates a propensity score (PS) using logistic regression, but instead of doing so with maximum likelihood, it does so using a different algorithm that yields exact mean balance on the included covariates. This is described in Zhao & Percival (2017) and Zhou (2019), among others.

However, it was not known that this was what EB was when it was first described in Hainmueller (2012). Hainmueller considered EB an optimization problem: estimate weights for each individual such that the following characteristics hold: the covariate means are exactly balanced after weighting, the weights are positive, and the "negative entropy" of the weights is minimized. The negative entropy is a measure of variability, so EB weights are meant to be less extreme than standard IPTW weights. Instead of having to do the optimization problem and estimate $n$ parameters (i.e., a weight for each individual in the sample), Hainmueller discovered a trick where you can just estimate one parameter for each variable to be balanced. The reason this trick is possible is because of the later-discovered fact that EB is a special kind of logistic regression, and in logistic regression you just estimate one parameter for each variable (i.e., the regression coefficient).

For the ATE, unfortunately, it's a different story. The nice equivalence between logistic regression and EB doesn't hold, but WeightIt still relies on the trick of estimating one parameter per variable (actually two, one for each treatment group) instead of estimating a weight for each unit. How WeightIt does it is irrelevant, but to summarize, it performs EB twice, once for each treatment group, and estimates weights for each treatment group that yield exact mean balance on the covariates between each treatment group and the overall sample.

Since the goal of IPTW is to achieve balance, EB skips the step of estimating a PS and goes straight to balance, while ensuring the weights have minimal variability. For this reason, it performs excellently in simulations and real data. It is in line with the philosophy of matching as nonparametric preprocessing described by Ho et al. (2007), who identify the PS tautology, which is that a good PS achieves balance, but the only way to evaluate a PS is to assess whether it has achieved balance. So EB skips the middleman and goes straight to balance, skipping over the steps of estimating a PS, checking balance, if balance isn't good, choosing a different PS specification, etc. EB guarantees exact mean balance on the covariates right away.

There are two philosophies to estimating PSs, which I described in detail in this post, which mentions EB and its alternatives. First, there is the philosophy of trying to estimate the PS as accurately as possible, because then the "magical" properties of the PS that guarantee unbiasedness in large samples come into play. Second, there is the philosophy of estimating PSs that yield balance with no attempt to estimate the true PS or even an accurate one. EB falls squarely in the second camp, omitting a PS entirely. However, one weakness of this is that the magical properties of the PS cannot come into play: you can only balance the terms you request to be balanced, and there is no guarantee the rest of the covariate distribution (i.e., moments beyond the means, features of the joint distribution like covariances) will be balanced unless those are specifically requested, too. An analyst at SAS said, wisely, "When a metric becomes a target, it ceases to be a metric"; that is, measured covariate balance is a metric of the PS's ability to balance unmeasured features of the covariate distribution (and by unmeasured I mean unseen features of the distribution of observed covariates, not unmeasured covariates), and achieving measured balance automatically using EB doesn't tell you about the unmeasured features of the covariate distribution. You can no longer rely on the theoretical properties of the PS to balance the distributions.

Okay, I know I've been a little theoretical and technical here. I'l bring it back to answering your questions directly.

How can I make all the variables balanced? Is it possible to "re-weight"? How?

You can use EB directly on the covariates; you don't need to re-weight (i.e., apply entropy balancing to the propensity score-weighted sample). That is, if your IPTWs didn't yield balance, toss them out and use a different method of estimating weights. EB is one, but there are others. My favorite is energy balancing, which is also implemented in WeightIt. (It actually is possible to combine IPTW and EB, which was one of the winning methods in the 2016 ACIC data competition. It has not been studied beyond that, though.)

Is it possible to apply directly "entropy balancing" instead of IPTW in this case? Can somebody explain to me entropy balancing? I tried reading the original paper (here) but I didn't understand it so much. How is entropy balance computed? Can it be always used at the same conditions as IPTW or are there particular conditions?

I attempted to answer this above, but I'll summarize. EB for the ATE skips the PS and estimates weights that exactly balance the covariate means and ensure the weights have minimal variability. The specific method of estimation is a very simple optimization that runs extremely fast. For the ATT, the story is slightly different, and more connections to standard IPTW exist. For a treatment at a single time point, EB can be used in the exact same situations IPTW can, including for binary, multi-category, and continuous treatments, for the ATT or ATE, for subgroup analysis, etc. The estimates from EB have the exact same interpretations as those from IPTW. There are many extensions to entropy balancing, including for longitudinal treatments and when you have a single treated unit and multiple controls (this is called the synthetic control method). For the ATT, it performs almost uniformly better than logistic regression-based PS weighting except in pathological circumstances.

If entropy balancing is able to adjust with Standardized differences of almost 0, then why is it so little used in the medical field?

Mostly because medical researchers have not heard of it, and even if they have, they might be scared to use it because it sounds complicated, even though it isn't. It is very popular in labor economics and is getting more popular in medicine and other fields as well, slowly. It deserves way more attention and, in my opinion, should be the first method a researcher tries, not a backup when IPTW fails. It must be accompanied by a robust assessment of balance because the theoretical properties of the propensity score do not apply (for the ATE, but they actually do for the ATT); this includes assessing balance beyond the means using, e.g., KS statistics and balance statistics for interactions and polynomial terms, which are all available in cobalt.

I noticed that in some papers there is the cohort after 1st weighting, then 2nd weighting, etc.. can someone explain how you obtain this? how many weighting do you have to do?

I'm not exactly sure what you're referring to, but this is probably multiple attempts to estimate a single set of weights that balance the covariates. E.g., you try a logistic regression, then a logistic regression with squared terms added, then with some interactions added, etc. Only the properties of the final set of weights (i.e., those that yield the best balance without sacrificing precision) should be reported and used in effect estimation, but it is important to describe your process of estimating weights in your manuscript to ensure your procedure is replicable. (There are some contexts where multiple sets of weights are combined together, but that is an advanced matter that is beyond the scope of your question.)

Go forth, and use entropy balancing!