Need help understanding least squares solution to overdetermined system

least squares

(Sorry I had to post the images as links. I don't have enough cred to post pictures directly yet)

I'm trying to understand what the least squares solution to an overdetermined system means geometrically in the case of the following system:

$$
y = -2x-1\\
y = 3x -2\\
y = x+1\\
$$

rewritten in matrix form:

$$
\overbrace{\begin{bmatrix}
2 & 1\\
-3 & 1\\
-1 & 1
\end{bmatrix}}^A
\underbrace{\begin{bmatrix}
x\\
y
\end{bmatrix}}_x =
\overbrace{\begin{bmatrix}
-1\\
-2\\
1
\end{bmatrix}}^b
$$

Using A\b in MATLAB, you get the solution $\begin{bmatrix}0.1316 & -0.5789\end{bmatrix}^T$. I know that MATLAB returns the lowest norm solution of a least squares problem. I have plotted the system here and the green dot in the middle is this least squares solution.

Now, correct me if I'm wrong, but (in this 2D case) the least squares solution minimizes the "distance" from the solution to each line. I can geometrically calculate the distance of a point $(x_0,y_0)$ from a line $ax + by + c = 0$ as follows:

$$\mathrm{distance} = \frac{|ax_0 + by_0 + c|}{\sqrt{a^2 + b^2}}$$

and doing that for each line produces the following sum of squared distances function

dfun = @(x,y) ((y+2*x+1).^2)/(1^2 + 2^2) + ((y+3*x+2).^2)/(1^2 + 3^2) + ((y+x-1).^2)/(1^2 + 1^2);

If I generate a surface using this function over a range of $x$ and $y$ values, I get this surface with this top-down view (looking down the z-axis on the xy plane). You can download the MATLAB .fig file here if you want to zoom and pan (requires MATLAB, link expires in 30 days).

Here is an image showing the least squares solution with the sum of squares of distances of the solution and its norm. As can be seen, the norm is $0.5937$ and the distance is $1.4704$. But clearly, there is a contour that has a lower sun of squared distance in the image, as shown here for $(x_0, y_0) = (-0.3,0)$, where the norm and the sum of squared distances are both smaller. Shouldn't this (or another point) be a better least squares solution? Do I have the wrong intuition about what least squares is doing here?

Best Answer

First of all : Welcome to the site !

When you face an overdetermined system of $m$ linear or nonlinear (even implicit) equations $$f_i(x_1,x_2,\cdots,x_n)=0 \quad\text{for}\quad i=1,2,\cdots,m\qquad \quad\text{and}\quad m >n$$ it reduces to the minimization of a norm.

The simplest is $$\Phi(x_1,x_2,\cdots,x_n)=\sum_{i=1}^n w_i \Big[f_i(x_1,x_2,\cdots,x_n)\Big]^2$$ which shows the analogy with weighted least-square method.

For the problem you gave, using equal weights, $$\Phi(x,y)=(y+2x+1)^2+(y-3x+2)^2+(y-x-1)^2$$ Computing the partial derivetives $$\frac{\partial \Phi(x,y)}{\partial x}=28 x-4 y-6 \qquad\text{and}\qquad \frac{\partial \Phi(x,y)}{\partial y}=-4 x+6 y+4$$ which gives $x=\frac{5}{38}$ and $y=-\frac{11}{19}$. At this point, $\Phi=\frac{169}{38}$ is the absolute minimum for this specific norm.

If you change the definition of the norm or even the weights, different results.

I think that this approach would be simpler than the one based on distance (but, for sure, I may be wrong) since it is totally general and not limited to linear equations.