Matrix regression proof that $\hat \beta = (X’ X)^{-1} X’ Y = {\hat \beta_0 \choose \hat \beta_1}$

regression

Matrix regression proof that $\hat \beta = (X' X)^{-1} X' Y = {\hat \beta_0 \choose \hat \beta_1} $

where $\beta$ is the least square estimator of $\hat\beta$ of $\beta$

attempt

So I know ${\hat \beta_0 \choose \hat \beta_1} = {\overline{Y} – \hat \beta_1 \overline{X} \choose \frac{\sum_{i=1}^{n} (X_i – \overline{X})(Y_i – \overline{X})}{\sum_{i=1}^{n}(X_i – \overline{X})^2}}$

Not really sure how to start as I don't know what formulas there are to reduce any of this. And if this was answered elsewhere please duplicate I was trying to search but couldn't

Best Answer

Our goal is to minimize $$ f(\beta) = \frac12 \| X \beta - Y \|^2. $$ Notice that $f = g \circ h$, where $h(\beta) = X \beta - Y$ and $g(u) = \frac12 \| u \|^2$. The derivatives of $g$ and $h$ are given by $$ g'(u) = u^T, \quad h'(\beta) = X. $$ By the chain rule, we have \begin{align} f'(\beta) &= g'(h(\beta)) h'(\beta) \\ &= (X \beta - Y)^T X. \end{align} The gradient of $f$ is $$ \nabla f(\beta) = f'(\beta)^T = X^T( X \beta - Y). $$ Setting the gradient of $f$ equal to $0$, we discover that $$ X^T X \beta = X^T Y. $$