Solved – Confidence interval for average treatment effect from propensity score weighting

causalityobservational-studypropensity-scoresrsurvey

I am trying to estimate the average treatment effect from observational data using propensity score weighting (specifically IPTW). I think I am calculating the ATE correctly, but I don't know how to calculate the confidence interval of the ATE while taking into account the inverse propensity score weights.

Here is the equation I'm using to calculate the average treatment effect (reference Stat Med. Sep 10, 2010; 29(20): 2137–2148.):
$$ATE=\frac1N\sum_1^N\frac{Z_iY_i}{p_i}-\frac1N\sum_1^N\frac{(1-Z_i)Y_i}{1-p_i}$$
Where $N=$total number of subjects, $Z_i=$treatment status, $Y_i=$outcome status, and $p_i=$ propensity score.

Does anyone know of an R package that would calculate the confidence interval of the average treatment effect, taking into account the weights? Could the survey package help here? I was wondering if this would work:

library(survey)
sampsvy=svydesign(id=~1,weights=~iptw,data=df)
svyby(~surgery=='lump',~treatment,design=sampsvy,svyciprop,vartype='ci',method='beta')

#which produces this result:
  treatment surgery == "lump"      ci_l      ci_u
   No         0.1644043 0.1480568 0.1817876
   Yes         0.2433215 0.2262039 0.2610724

I don't know where to go from here to find the confidence interval of the difference between the proportions (i.e. the average treatment effect).

Best Answer

You don't need the survey package or anything complicated. Wooldridge (2010, p. 920 onwards) "Econometric Analysis of Cross Section and Panel Data" has a simple procedure from which you can obtain the standard errors in order to construct the confidence intervals.

Under the assumption that you have correctly specified the propensity score which we denote as $p(\textbf{x}_i,\textbf{$\gamma$})$, define the score from the propensity score estimation (i.e. your first logit or probit regression) as $$\textbf{d}_i = \frac{\nabla_\gamma p(\textbf{x}_i,\textbf{$\gamma$})'[Z_i-p(\textbf{x}_i,\textbf{$\gamma$})]}{p(\textbf{x}_i,\textbf{$\gamma$}){[1-p(\textbf{x}_i,\textbf{$\gamma$})]}} $$ and let $$\text{ATE}_i = \frac{[Z_i-p(\textbf{x}_i,\textbf{$\gamma$})]Y_i}{p(\textbf{x}_i,\textbf{$\gamma$}){[1-p(\textbf{x}_i,\textbf{$\gamma$})]}}$$ as you have it in your expression above. Then take the sample analogues of these two expressions and regress $\widehat{\text{ATE}}_i$ on $\widehat{\textbf{d}}_i$. Make sure you include an intercept in this regression. Let $e_i$ be the residual from that regression, then the asymptotic variance of $\sqrt{N}(\widehat{\text{ATE}} - \text{ATE})$ is simply $\text{Var}(e_i)$. So the asymptotic standard error of your ATE is $$\frac{\left[ \frac{1}{N}\sum^N_{i=1}e_i^2 \right]^{\frac{1}{2}}}{\sqrt{N}}$$

You can then calculate the confidence interval in the usual way (see for example the comments to the answer here for a code example). You don't need to adjust the confidence interval again for the inverse propensity score weights because this step was already included in the calculation of the standard errors.

Unfortunately I am not an R guy so I can't provide you with the specific code but the outlined procedure above should be straight forward to follow. As a side note, this is also the way in which the treatrew command in Stata works. This command was written and introduced in the Stata Journal by Cerulli (2014). If you don't have access to the article you can check his slides which also outline the procedure of calculating the standard errors from inverse propensity score weighting. There he also discusses some slight conceptual differences between estimating the propensity score via logit or probit but for the sake of this answer it was not overly important so I omitted this part.

Related Question