Linear Regression Model – Estimating ATT

linear modelregressiontreatment-effect

Suppose there is a binary treatment such as taking a drug or not.

Let $D_i=1$ if an individual $i$ took the drug and $D_i=0$ if she did not.

Let $Y^1_i$ is the potential outcome (e.g. blood pressure) when she took the drug.

Let $Y^0_i$ is the potential untreated outcome.

In this setup, we can estimate the average treatment effect (ATE) using a linear regression like below:

$$Y_i=D_i Y_i^1+(1-D_i)Y^0_i \\
=Y_i^0+(Y_i^1-Y_i^0)D_i$$

where $Y_i$ is the observed outcome.

Then, rewriting $Y_i^0$ and $Y_i^1-Y_i^0$ as
$$Y_i^0=\mathbb{E}[Y_i^0]+\{Y_i^0-\mathbb{E}[Y_i^0]\} \\
Y_i^1-Y_i^0=\mathbb{E}[Y_i^1-Y_i^0]+\{(Y_i^1-Y_i^0)-\mathbb{E}[Y_i^1-Y_i^0]\},$$

the observed outcome is
$$Y_i=\mathbb{E}[Y_i^0]+\mathbb{E}[Y_i^1-Y_i^0]D_i+U_i$$
where $U_i=\{Y_i^0-\mathbb{E}[Y_i^0]\}+\{(Y_i^1-Y_i^0)-\mathbb{E}[Y_i^1-Y_i^0]\}D_i$.

Therefore, we can estimate the ATE=$\mathbb{E}[Y_i^1-Y_i^0]$ by regressing $Y_i$ on $[1,\;D_i]$ with pretty strong assumptions.

Here, I am wondering whether we can also estimate the ATT=$\mathbb{E}[Y_i^1-Y_i^0|D_i=1]$ using a linear regression like above case.

Best Answer

If you have a randomized trial, then ATE = ATT because $E[Y^d|D=1, X] = E[Y^d|D=0, X]$ for all $d \in \{0,1\}$ and pretreatment covariates $X$, so $E[Y^1 - Y^0|D=1] = E[Y^1 - Y^0]$. Thus, a simple difference in means is sufficient to estimate either.

In an observational study with no unmeasured confounding, you can estimate the ATE and ATT using g-computation. This involves first fitting a model for $Y$ given $X$ and the observed $D$; this should ideally include interactions between the treatment and covariates or can be fit separately for each treatment group. To estimate the ATE, generate predicted values $\hat{Y}_1$ and $\hat{Y}_0$ for each individual as the predicted values from this model after setting all units' treatment values to 1 and then to 0, respectively. The estimate of the ATE is $n^{-1}\sum{\hat{Y}_1} - n^{-1}\sum{\hat{Y}_0}$. To estimate the ATT, generate predicted values $\hat{Y}_0$ for each treated individual after setting their treatment values to 0. The estimate of the ATT is $n_1^{-1}\sum_{i:D_i=1}{Y} - n_1^{-1}\sum_{i:D_i=1}{\hat{Y}_0}$ where $n_1$ is the number of treated units.

There is an equivalent way to get these estimates as coefficients in a regression model, which will only be unbiased when the potential outcomes are linear in the covariates. For both estimands, you will fit a regression of $Y$ on $X$, $D$, and the interaction between $D$ and each $X$ and use the coefficient on $D$ as the treatment effect estimate. For the ATE, you must first center the $X$ at their mean in the full sample. For the ATT, you must first center the $X$ at their mean in the treated group.

Note that the coefficient on $D$ on a regression of $Y$ on $X$ and $D$ but without their interaction corresponds to the ATE only when there is no effect modification by the covariates or the distributions of covariates are the same in both groups (i.e., as the result of a randomized trial), in which cases the ATE and ATT are equal.

Related Question