Why can’t I minimize the squared distance

geometryoptimization

My question is "Why can't I minimize the squared distance?" It would be a lot easier, but it yields the wrong answer.

I set out to write an example for using Newton's Method for Multiple equations and decided to show the following problem. Given a line, such as $\space 4x+3y-7=0 \space$ and three points, $A=(1,4), \space B=(3,4), \space C=(-1,-1)$, find the point on the line that is the shortest summed distance to all three points. A nearly identical problem has been discussed in this forum here. How to find the point on a line which has the minimum length to three points?
I used Lagrange optimization to set up my problem and solved it with Geogebra's CAS, before writing code in SAGEmath. The Lagrange system is as follows.
$$\mathscr{L}(x,y,\lambda)=\Vert{A-(x,y)}\Vert+\Vert B-(x,y)\Vert+\Vert C-(x,y)\Vert -\lambda \left(4x+3y-7\right) \tag{EQ 1} \label{EQ_1}$$

I took the partial derivatives, set them to zero and solved the system. However, it was a lot easier to manipulate the derivatives if I squared the equations first. That is, I wanted my Lagrange system to be
$$\mathscr{L}(x,y,\lambda)=\left[(1-x)^2+(4-y)^2\right]+\left[(3-x)^2+(4-y)^2\right]+\left[(-1-x)^2+(-1-y)^2\right] -\lambda \left(4x+3y-7\right) \tag{EQ 2} \label{EQ 2}$$ When I am only using one point, this works perfectly and I rationalize it by saying, ". . .this has to be a non-negative function since it is difficult to imagine a negative distance. Any non-negative function will have the same minimum as its square since at every point it would be multiplied by itself." If I use $\eqref{EQ 2}$ it indeed minimizes the squared distance from the 3 points to the line, but that distance is not the same as the minimum distance to the line. I have a working example at https://www.geogebra.org/m/hptbypvy . It takes 20 or 30 sec to load and I have to tell my browser to wait a couple of times.

Best Answer

You are probably thinking of the following fact: minimizing $\sqrt{d^2}$ is equivalent to minimizing $d^2$ because $\sqrt{\cdot}$ is a monotonic increasing function.

But once you introduce sums you throw things off. Minimizing $\sum\sqrt{d_i^2}=\sum|d_i|$ is not equivalent to minimizing $\sum d_i^2$.

Look at a simple example. Take $d_i=(i-x)$, for $i=1, 2, 6$.

The minimum of $\sum d_i^2=(1-x)^2+(2-x)^2+(6-x)^2=3x^2-18x+41$ occurs at $x=3$.

But the minimum of $\sum\sqrt{d_i^2}=|1-x|+|2-x|+|6-x|$ occurs at $x=2$.

This is related to the fact that the mean minimizes the sum of squared deviations, while the median minimizes the sum of absolute deviations.

Related Question