[Math] Solving least squares problem using partial derivatives

least squareslinear algebramultivariable-calculuspartial derivativeprojection

Let's say we want to solve a linear regression problem by choosing the best slope and bias with the least squared errors. As example, let the points be $x=[1, 2, 3]$ and $y=[1,2,2]$.

This quadratic minimization problem can also be represented as:

$||Ax-b||^2=||e||^2=e^2_1+e^2_2+e^2_3$

Linear Algebra

We could solve this problem by utilizing linear algebraic methods. We could use projections. For projecting on the $0+$ dimensional subspaces. Projection equation $p = Ax = A(A^TA)^{-1}A^Tb$ could be utilized:

$A^T(b-Ax)=0$

$A^TAx = A^Tb$

$x = (A^TA)^{-1} A^Tb$

We know the inner product of $A^T$ and $e=b-p=b-Ax$ is $0$ since they are orthogonal (or since $e$ is in the null space of $A^T$). Which is the reason why we got the equation above.

Now we need to present the quadratic minimization problem in linear algebra $Ax=b$:

$\begin{bmatrix}1 & 1 \\ 1 & 2 \\ 1 & 3 \end{bmatrix}\begin{bmatrix}c \\m\end{bmatrix} = \begin{bmatrix}1 \\ 2 \\ 2 \end{bmatrix}$

where $c$ is bias and $m$ is slope. We can see that matrix $A$ is a basis for the column space, $c$ and $m$ are linear coefficients and $b$ represents range of the function. Considering that this equation doesn't have direct solution, then we are looking for projection of the vector $b$ on the column space of matrix $A$.

Solution:

Let $Proj(x)$ be the projection function (where $x$ contains unknown coefficients that we are trying to find, in this case $[c, m]^T$):

$Proj(x) = Proj\left(\begin{bmatrix}c \\ m \end{bmatrix}\right) = (A^TA)^{-1}A^Tb = \left(\begin{bmatrix}1 & 1 & 1 \\ 1 & 2 & 3\end{bmatrix}\begin{bmatrix}1 & 1 \\ 1 & 2 \\ 1 & 3\\ \end{bmatrix}\right)^{-1} \begin{bmatrix}1 & 1 & 1 \\ 1 & 2 & 3\end{bmatrix}\begin{bmatrix}1 \\ 2 \\ 2\\ \end{bmatrix} = \left(\begin{bmatrix}3 & 6 \\ 6 & 14 \end{bmatrix}\right)^{-1}\begin{bmatrix}5 \\ 11 \end{bmatrix}=\left(\frac{1}{3(14)-6(6)}\begin{bmatrix}14 & -6 \\ -6 & 3 \end{bmatrix}\right)\begin{bmatrix}5 \\ 11 \end{bmatrix}=\begin{bmatrix}2.33333333 & -1 \\ -1 & 0.5 \end{bmatrix}\begin{bmatrix}5 \\ 11 \end{bmatrix} = \begin{bmatrix}0.66666667 \\ 0.5 \end{bmatrix}$

Multivariable Calculus

At this point of the lecture Professor Strang presents the minimization problem as $A^TAx=A^Tb$ and shows the normal equations. Then he proceeds solving minimization problem using partial derivatives, although I couldn't quite understand how could partial differentiation be used to solve this problem.

From what I know, partial derivatives can be used to find derivatives for the structures that are in higher dimensions. Since for example finding full derivative at certain point of a 3 dimensional object may not be possible since it can have infinite tangent lines. Although, by treating one variable as a constant can be utilized to solve the differentiation problem, and this process is called partial differentiation from my knowledge.

Question

How does partial differentiation solution exactly work? How can it be compared to the linear algebraic orthogonal projection solution?

Thank you!

Best Answer

To try to answer your question about the connection between the partial derivatives method and the method using linear algebra, note that for the linear algebra solution, we want $$(Ax-b)\cdot Ax = 0$$. For the partial derivatives, we want $\frac{\partial}{\partial x_1}||Ax-b||^2 = 0$ and $\frac{\partial}{\partial x_2}||Ax-b||^2 = 0$. Suppose we have $n$ data points and $n$ inputs $a_1,a_2,\cdots a_n$. Then, with $x_1$ representing the slope of the least squares, and $x_2$ representing the intercept, we have that $$\frac{\partial}{\partial x_1}||Ax-b||^2 = 2\sum_{i=1}^{n}a_i(x_1a_i+x_2-b_i) = 0$$ and $$\frac{\partial}{\partial x_2}||Ax-b||^2 = 2\sum_{i=1}^{n}(x_1a_i+x_2-b_i) = 0$$. This implies that $$x_1\sum_{i=1}^{n}a_i(x_1a_i+x_2-b_i)+x_2\sum_{i=1}^{n}(x_1a_i+x_2-b_i) = 0$$ $$\implies \sum_{i=1}^{n} (x_1a_i+x_2)(x_1a_i+x_2-b_i)=0=Ax\cdot (Ax-b)$$

Related Question