R – Recognizing Extreme Weights in Inverse Probability Treatment Weighting (IPTW)

propensity-scoresrweighted meanweighted-dataweights

I have some doubts about how to recognize if there are extreme weights after balancing my population with inverse probability treatment weighting.

For instance, let's look at these results [code at the end of the post] – I know that age is not perfectly balanced but it doesn't matter as it is just an example:

enter image description here
M.0.Adj = Weighted mean-weighted rate for the non-treated population / SD.0.Adj = Standard Deviation in non-treated / M.1.Adj = Weighted mean-weighted rate for the treated population / SD.1.Adj = Standard Deviation in treated / Diff.adj = Standardized Mean Difference / V.Ratio.Adj = The ratio of the variances of the two groups after adjusting

Moreover, these are a density plot with the propensity scores and a histogram with weights I made:
enter image description here
enter image description here

This is an example of the balance achieved (I don't know if it is useful in this context):
enter image description here

What to I have to look at in order to know if there are extreme weights? Do I have to look at the plots? How can I know if I balanced correctly and there are no problems caused by extreme weights so that I don't have to take further action to correct them (e.g. trimming …)? I don't know how to "recognize" the extreme weights.

For those who prefer to have the code:

library(cobalt)
library(WeightIt)
library(dplyr)
data("lalonde", package = "cobalt")

W.out <- weightit(treat ~ age + educ + race + married + nodegree + re74 + re75,
                  data = lalonde, estimand = "ATT", method = "ps")

lalonde <- lalonde %>% mutate(weights = W.out$weights)
lalonde <- lalonde %>% mutate(ps = W.out$ps)

summary(W.out)
bal.tab(W.out, stats = c("m", "v"), thresholds = c(m = .10), disp=c("means", "sds"))

library(ggplot2)
ggplot(lalonde, aes(x = ps, fill = as.factor(treat))) +
  geom_density(alpha = 0.5, colour = "grey50") +
  geom_rug() +
  scale_x_log10(breaks = c(1, 5, 10, 20, 40)) +
  ggtitle("Distribution of propensity scores")

library(weights)    
wtd.hist(W.out$weights)

Best Answer

The problem with extreme weights is that they yield high variability in the weights which decreases the effective sample size. You don't have to check for extreme weights; you just need to check for an unacceptably low effective sample size. In this case, the ESS for the control group decreased by quite a lot. You might wonder why that is. One answer could be a few extreme weights that dramatically increase the variance of the weights. Looking at the summary of weights and their histograms, it seems this could be the case.

The output of summary(W.out) displays the ESS and information on the largest weights. You can see that the largest weights are between 3 and 4, but their values are quite similar. These values do not seem too extreme, though they are clearly quite a bit larger than the average control group weight of ~.44.

You can use plot(summary(W.out)) to directly plot a histogram of the weights. The output looks like the following:

enter image description here

It's pretty clear that most control weights are quite small and there are a few weights that are relatively large, which is likely causing the decrease in ESS. There is no individual unit with an extreme weight, but rather a cluster of units with unusually high weights. You can see if trimming (i.e., winsorizing) the weights makes a difference using trim(); I find that trimming the weights to anywhere between the 85th and 95th percentile improves the ESS without dramatically worsening balance.


I appreciate you wanting to practice your coding skills to generate the plots, but cobalt and WeightIt provide utilities for making those plots. Instead of using weights::wtd.hist(), you can just use plot(summary(W.out)) as I mentioned above. Also, hist() would have sufficed; a histogram of weights is not the same thing as a weighted histogram, which is what weights::wtd.hist() displays. You didn't even use the weights argument, which is the only way that differs from hist(). I'm not sure why your histogram has values greater than 20; are you sure you are using the right code to generate that plot? To plot the distribution of propensity scores, just use cobalt::bal.plot(), e.g., bal.plot(W.out, "prop.score", which = "both").