[Math] Closed form for coefficients in Multiple Regression model

regressionstatistics

I want to find $\hat{\beta}$ in ordinary least squares s.t. $\hat{Y} = \hat{\beta}_0 + \hat{\beta}_1 X_1 + \cdots + \hat{\beta}_n X_n $. I know the way to do this is through the normal equation using matrix algebra, but I have never seen a nice closed form solution for each $\hat{\beta}_i$. I'm thinking as a generalization of the simple linear regression case,

$$ \hat{\beta}_i = \frac{ Cov(X_i, Y) }{Var(X_i) },$$

where $ Y = \beta_0 + \beta_1 X_1 + \cdots + \beta_n X_n + \epsilon_i $.

Is my conjecture for the form of the regression coefficients true? And what would $\hat{\beta_0}$ be?

Best Answer

Your way of using the letter $n$, rather than using that for the sample size, is irritating. I'll write consistently with that and use $m$ for the sample size.

You have a design matrix $$ X=\begin{bmatrix} 1 & x_{11} & \cdots & x_{1n} \\ 1 & x_{21} & \cdots & x_{2n} \\ \vdots & \vdots & & \vdots \\ 1 & x_{m1} & \cdots & x_{mn} \end{bmatrix} $$ Then $$ \begin{bmatrix} Y_1 \\ \vdots \\ Y_m \end{bmatrix} = \begin{bmatrix} 1 & x_{11} & \cdots & x_{1n} \\ 1 & x_{21} & \cdots & x_{2n} \\ \vdots & \vdots & & \vdots \\ 1 & x_{m1} & \cdots & x_{mn} \end{bmatrix} \begin{bmatrix} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_n \end{bmatrix} + \begin{bmatrix} \varepsilon_1 \\ \vdots \\ \varepsilon_m \end{bmatrix}. $$ Write this as $$ Y=X\beta+\varepsilon. $$ Then the least-squares estimate are $$ \hat\beta = (X^T X)^{-1} X^T Y. $$ The potentially messy part --- the only nonlinear part --- is the matrix inversion.

If you want just $\hat\beta_k$, put a row vector in front of the above, with the $k$th entry equal to $1$ and the others $0$.