I have some doubts about how to recognize if there are extreme weights after balancing my population with inverse probability treatment weighting.
For instance, let's look at these results [code at the end of the post] – I know that age is not perfectly balanced but it doesn't matter as it is just an example:
M.0.Adj = Weighted mean-weighted rate for the non-treated population / SD.0.Adj = Standard Deviation in non-treated / M.1.Adj = Weighted mean-weighted rate for the treated population / SD.1.Adj = Standard Deviation in treated / Diff.adj = Standardized Mean Difference / V.Ratio.Adj = The ratio of the variances of the two groups after adjusting
Moreover, these are a density plot with the propensity scores and a histogram with weights I made:
This is an example of the balance achieved (I don't know if it is useful in this context):
What to I have to look at in order to know if there are extreme weights? Do I have to look at the plots? How can I know if I balanced correctly and there are no problems caused by extreme weights so that I don't have to take further action to correct them (e.g. trimming …)? I don't know how to "recognize" the extreme weights.
For those who prefer to have the code:
library(cobalt)
library(WeightIt)
library(dplyr)
data("lalonde", package = "cobalt")
W.out <- weightit(treat ~ age + educ + race + married + nodegree + re74 + re75,
data = lalonde, estimand = "ATT", method = "ps")
lalonde <- lalonde %>% mutate(weights = W.out$weights)
lalonde <- lalonde %>% mutate(ps = W.out$ps)
summary(W.out)
bal.tab(W.out, stats = c("m", "v"), thresholds = c(m = .10), disp=c("means", "sds"))
library(ggplot2)
ggplot(lalonde, aes(x = ps, fill = as.factor(treat))) +
geom_density(alpha = 0.5, colour = "grey50") +
geom_rug() +
scale_x_log10(breaks = c(1, 5, 10, 20, 40)) +
ggtitle("Distribution of propensity scores")
library(weights)
wtd.hist(W.out$weights)
Best Answer
The problem with extreme weights is that they yield high variability in the weights which decreases the effective sample size. You don't have to check for extreme weights; you just need to check for an unacceptably low effective sample size. In this case, the ESS for the control group decreased by quite a lot. You might wonder why that is. One answer could be a few extreme weights that dramatically increase the variance of the weights. Looking at the summary of weights and their histograms, it seems this could be the case.
The output of
summary(W.out)
displays the ESS and information on the largest weights. You can see that the largest weights are between 3 and 4, but their values are quite similar. These values do not seem too extreme, though they are clearly quite a bit larger than the average control group weight of ~.44.You can use
plot(summary(W.out))
to directly plot a histogram of the weights. The output looks like the following:It's pretty clear that most control weights are quite small and there are a few weights that are relatively large, which is likely causing the decrease in ESS. There is no individual unit with an extreme weight, but rather a cluster of units with unusually high weights. You can see if trimming (i.e., winsorizing) the weights makes a difference using
trim()
; I find that trimming the weights to anywhere between the 85th and 95th percentile improves the ESS without dramatically worsening balance.I appreciate you wanting to practice your coding skills to generate the plots, but
cobalt
andWeightIt
provide utilities for making those plots. Instead of usingweights::wtd.hist()
, you can just useplot(summary(W.out))
as I mentioned above. Also,hist()
would have sufficed; a histogram of weights is not the same thing as a weighted histogram, which is whatweights::wtd.hist()
displays. You didn't even use theweights
argument, which is the only way that differs fromhist()
. I'm not sure why your histogram has values greater than 20; are you sure you are using the right code to generate that plot? To plot the distribution of propensity scores, just usecobalt::bal.plot()
, e.g.,bal.plot(W.out, "prop.score", which = "both")
.