Expand the coefficient formula of multiple linear regression with intercept

linear regression

If we consider the multiple linear regression with intercept:

$$y = \alpha + \beta_1x_1 + \cdots + \beta_nx_n,$$

it is the well-know formula of solution:

$$\theta = (M^TM)^{-1}M^Ty.$$

Here

  1. $\theta = (\alpha, \beta^T)^T,$ $\alpha$ is intercept; $\beta = (\beta_1,\cdots,\beta_n).$

  2. $$M=\begin{pmatrix}
    1 & x_{12} & \cdots & x_{1m}\\
    1 & x_{22} & \cdots & x_{2m}\\
    \vdots & \vdots & \vdots & \vdots\\
    1 & x_{N2} & \cdots & x_{Nm}
    \end{pmatrix}=(e,X),$$

    is the extended sample matrix.

Now I want to expand $\hat{\theta}$ to get the closed form of $\beta.$ Compared with simple simple linear regression, I guess the formula of $\beta$ is just the solution of linear regression without intercept under the centralized sample $y_c,X_c:$

$$\beta = (X_c^TX_c)X_c^Ty_c.$$

$X_c,y_c$ is centralized sample matrix of $X,y.$

Is there any easy way to proof it? Since it is related to the inverse of block matrix, which seems a little bit complicate.

Best Answer

Think of your observations as a bunch of points in the $n+1$ dimension space. You are trying to find a hyperplane to minimize the sum of vertical distance (to this plane) squared.

For any candidate slopes $\beta_1, ..., \beta_n$, the first order derivative on $\alpha$ shows that the optimal $\alpha$, for such $\beta_1, ..., \beta_n$, must ensure the hyperplane go through the "center of gravity" of all these points, defined as $(\bar{x_1}, ..., \bar{x_n}, \bar{y})$.

Therefore, OLS hyperplane must go thru the "center of gravity".

Among all hyperplanes going thru the "center of gravity", the one that minimizes vertical distance squared is the one you wrote down.

Edit: Direct matrix algebra proof:

Let $\vec{1}$ be the $T$ by 1 vector of 1. Original OLS: $Y = (\vec{1}, X) (\alpha, \beta^T)^T$.

Notice that $Y_c = (I - \frac{1}{T}\vec{1}{\vec{1}}^T)Y$; and $X_c = (I - \frac{1}{T}\vec{1}{\vec{1}}^T)X$; and $I - \frac{1}{T}\vec{1}{\vec{1}}^T$ is idempodent.

Carry out the block inverse https://en.wikipedia.org/wiki/Block_matrix (use the last expression there-within, with the inverse expressed as a product, right above the reference to Weistein-Aronszajin identity). The rest follows easily.