Matrix derivative in multiple linear regression model

linear regressionmatricesstatistics

The basic setup in multiple linear regression model is

\begin{align}
Y &= \begin{bmatrix}
y_{1} \\
y_{2} \\
\vdots \\
y_{n}
\end{bmatrix}
\end{align}

\begin{align}
X &= \begin{bmatrix}
1 & x_{11} & \dots & x_{1k}\\
1 &x_{21} & \dots & x_{2k}\\
\vdots & \dots & \dots\\
1 & x_{n1} & \dots & x_{nk}
\end{bmatrix}
\end{align}

\begin{align}
\beta &= \begin{bmatrix}
\beta_{0} \\
\beta_{1} \\
\vdots \\
\beta_{k}
\end{bmatrix}
\end{align}

\begin{align}
\epsilon &= \begin{bmatrix}
\epsilon_{1} \\
\epsilon_{2} \\
\vdots \\
\epsilon_{n}
\end{bmatrix}
\end{align}

The regression model is $Y=X \beta + \epsilon$.

To find least square estimator of $\beta$ vector, we need to minimize $S(\beta)=\Sigma_{i=1}^n \epsilon_i^2 = \epsilon ' \epsilon = (y-x\beta)'(y-x\beta)=y'y-2\beta 'x'y + \beta 'x'x \beta$

$$\frac{\partial S(\beta)}{\partial \beta}=0$$

My question: how to get $-2x'y+2x'x \beta$?

Best Answer

The sum of the squared errors can be written as

$$ \left\lVert \epsilon \right\rVert^2 = \left\lVert Y - X\beta \right \rVert^2 $$ $$ = (Y - X\beta)^T(Y - X\beta) = Y^TY - \beta^TX^TY - Y^TX\beta + \beta^TX^TX\beta $$ $$ = \left\lVert Y \right\rVert^2 - 2Y^TX\beta + \left\lVert X\beta \right\rVert^2 $$

Then, finding the gradient $\frac{d\left\lVert \epsilon \right\rVert^2}{d\beta}$

$$ \frac{d\left\lVert \epsilon \right\rVert^2}{d\beta} = -2X^TY + 2X^TX\beta $$

Reviewing term by term in that differentiation (since differentiation is linear operator!)

$\left\lVert Y \right\rVert^2$ does not depend on $\beta$ and becomes $0$.

$2Y^TX\beta$ is a is a sum where the $i^{th}$ term is $2\beta_iY^Tx_i$ where $x_i$ is the $i^{th}$ row of $X$. Therefore, the gradient of this term evaluates to $-2X^TY$.

$\left\lVert X\beta \right\rVert^2$ can be differentiated using the product rule

In general, the Jacobian of $f(x) = Ax $ is $ J_f = A^T$

Related Question