Solved – Why is the IPW (Inverse Probability Weighting) estimator unbiased when you know the propensity scores

treatment-effect

The IPW estimator, for outcomes $Y_i$, treatment $T_i$, and covariates $Z_i$ is:

$$
\widehat{\text{ATE}}_{\text{IPW}} = \frac{1}{n}\sum_{i=1}^{n}\left[\frac{T_iY_i}{\widehat{\pi}(Z_i)} – \frac{(1-T_i)Y_i}{1-\widehat{\pi}(Z_i)}\right]
$$

where $\widehat{\pi}(Z_i)$ is the estimate of the propensity score.

Now, in literature it states that if the propensity scores were know, the estimator above is unbiased.

My question is: If we know the propensity scores, do we replace $\widehat{\pi}(Z_i)$ above with $\pi(Z_i)$ (the true propensity score) to obtain unbiasedness? In other words, what does it mean for the propensity scores to be known and how does it flow into the unbiasedness? Thanks.

Best Answer

There are two common situations where PS weights are known:

  1. An experiment, in which case usually $\pi(Z_i)=\pi=1-\pi=\frac{1}{2}$, and your formula simplifies to a difference in means between treatment and control.
  2. A computer simulation, where you know the rule by which treatment is assigned, because you coded it up yourself.

Known PSs allow you to average over heterogeneity to calculate average treatment effects correctly.

Here's a simple example. Suppose men are more likely to be treated with $\pi(M) = 0.6$. Women receive treatment with probability $\pi(F)=0.4$. The untreated outcome for men is 20, while it is 10 for women. The treatment effect is the same at 5 for both genders. Suppose you sample 20 people, with 10 in each group. On average, the treated group will consists of 6 men and 4 women, with these demographics reversed in the control group.

In expectation, $$\bar Y_T= \frac{6 \cdot(20+5)+ 4 \cdot (10+5)}{10} = 21$$ and $$\bar Y_C =\frac{4\cdot 20+6 \cdot 10}{10} = 14.$$

Then naive difference in means will give you an upward-biased effect of $7 \ne 5$.

If you somehow knew the probabilities of treatment for each gender, you could scale each man's treated outcome down by $\frac{1}{0.6}$ and each woman's treated outcome up by $\frac{1}{0.4}=2.5$ to offset the composition. For untreated observations, you would scale each man up by $2.5$ and each women down by $1.\bar 6$.

Then your formula gives

$$ \frac{6 \cdot(20+5) \cdot \frac{1}{.6} + 4 \cdot (10+5) \cdot \frac{1}{.4} - 4\cdot 20 \cdot \frac{1}{.4}-6 \cdot 10 \cdot \frac{1}{0.6}}{20}=5,$$

which is the right answer.

Related Question