Regression – Variance of OLS Estimator with Binary Treatment

binary dataleast squaresregressionvariance

I know that in general, given a (stacked) regression of the form
$ y = X \beta + \epsilon$, where $\mathbb{V}(\epsilon_i) = \sigma^2 \forall i$, then letting $\hat{\beta}$ denote the OLS estimate of $\beta$,
\begin{equation}
\mathbb{V}(\hat{\beta}) = \sigma^2 (X'X)^{-1}
\end{equation}

This I understand. In my econometrics class, we are studying randomized trials with binary treatment, so $X \in \{0,1\}^n$, where $n$ is the number of observations. I am told that in this simple setup,

\begin{equation}
\mathbb{V}(\hat{\beta}) = \frac{\sigma^2}{p(1-n)p}
\end{equation}

where $p$ is the proportion of observations $i$ such that $X_i = 1$.

I have no idea how to derive the second equation from the first. Any hints?

Best Answer

In the case of simple linear regression

$$\underbrace{\begin{bmatrix} y_1 \\ y_2 \\ \vdots\\ y_n \end{bmatrix}}_{Y} = \underbrace{\begin{bmatrix} 1 & x_1 \\ 1 & x_2 \\ \vdots\\ 1 & x_n \end{bmatrix} \begin{bmatrix} \beta_0 \\ \beta_1 \end{bmatrix}}_{X\beta}.$$

Taking $X'X$ gives $$ \begin{bmatrix} n & \sum_{i=1}^{n}x_i \\ \sum_{i=1}^{n}x_i & \sum_{i=1}^{n}x_i^2 \end{bmatrix}. $$

After we take the inverse (i.e., $(X'X)^{-1}$) , we would find that the element at $(2,2)$ is

$$\frac{1}{ \sum_{i=1}^{n}(x_i-\bar{x})^2 }.$$

So $\mathbb{V}(\beta_1) = \frac{\sigma^2}{ \sum_{i=1}^{n}(x_i-\bar{x})^2 }$.

Next, we need to make $\sum_{i=1}^{n}(x_i-\bar{x})^2$ look like $np(1-p)$.

Since $x$ is binary, we know $\bar{x} = \frac{\sum_{i=1}^{n}x_i}{n} = p$. Therefore, we can rewrite the denominator of $\mathbb{V}(\beta_1)$ as $\sum_{i=1}^{n}(x_i-p)^2$.

Expanding gives

$$ \begin{align} \sum_{i=1}^{n}(x_i-p)^2 &= \sum_{i=1}^{n}(x_i^2 - 2px_i + p^2)\\ &= np - 2np^2 + np^2\\ &= np - np^2\\ &= np(1-p).\\ \end{align} $$

Therefore, $\mathbb{V}(\beta_1) = \frac{\sigma^2}{np(1-p)}$, which corresponds to $\mathbb{V}(\beta)$ in your question.

Related Question