Optimization – Best Fit Line Based on Sum of Squared Perpendicular Distances

analytic geometryoptimizationregression

Given a set of points $ P_i = (x_i,y_i), \ i =1,2,\dots, N $. I want to find the best fitting line to these points. The equation of the line is

$$ n^T ( r – r_0 ) = 0 $$

where $n$ is a unit vector. I want to find $r_0$ and $n$ such that the sum of squared perpendicular distances from $(x_i, y_i)$ to the line is minimized. That is, I want to minimize

$ f = \displaystyle \sum_{i=1}^N d_i^2 $

where $d_i = | n^T (P_i – r_0) | $

My Attempt: is detailed in my answer below.

Your comments, hints, and solutions are highly appreciated.

Best Answer

In the hope you don't mind if I do not use vectors but apply the LS method to $y=mx+b$ and minimize the sum of the squared perpendicular distances

$\displaystyle\sum_k \frac{(y_k - (mx_k +b))^2}{1+m^2}\Rightarrow min$

That is
$\displaystyle\frac{\displaystyle m^2\left(\sum x_{k}^2\right)+2m b \left(\sum x_{k}\right)-2m\left(\sum x_{k}y_{k}\right)+b^2 n-2b\left(\sum y_{k}\right)+\sum y_{k}^2}{m^2+1}$

$\displaystyle\frac{\partial}{\partial b}=0$ results in $\displaystyle b=\frac{\displaystyle-(\sum x_k)\cdot m+\sum y_k}{n}=\bar{y}-m\bar{x}$

Substituted $b$ in ansatz with it results in (equation $IV$)

$\displaystyle\frac{\displaystyle m^2 n\left(\sum x_k^2\right)-m^2\left(\sum x_k\right)^2-2mn\left(\sum x_ky_k\right)+2m\left(\sum x_k\right)\left(\sum y_k\right)+ n\left(\sum y_k^2\right)-\left(\sum y_k\right)^2}{n\left(m^2+1\right)}$

Fun starts here: $\displaystyle\frac{\partial}{\partial m}=0$ results

$\displaystyle m_1= \left(-n\sum x_k^2+n\sum y_k^2+(\sum x_k)^2-(\sum y_k)^2 \\ +\mathrm{sqrt}\left((n^2\sum x_k^2)^2-2n^2\sum x_k^2\sum y_k^2-2n\sum x_k^2(\sum x_k)^2+2n\sum x_k^2(\sum y_k)^2 \\ +n^2(\sum y_k^2)^2+2n\sum y_k^2(\sum x_k)^2-2n\sum y_k^2(\sum y_k)^2+ 4n^2(\sum x_k y_k)^2 \\ -8n\sum x_k y_k\sum x_k\sum y_k+(\sum x_k)^4+2(\sum x_k)^2(\sum y_k)^2+ (\sum y_k)^4\right)\right)/\left(2(n\sum x_k y_k-\sum x_k\sum y_k)\right)\,\mathrm{,} \\ \displaystyle m_2= \left(-n\sum x_k^2+n\sum y_k^2+(\sum x_k)^2-(\sum y_k)^2 \\ -\mathrm{sqrt}\left((n^2\sum x_k^2)^2-2n^2\sum x_k^2\sum y_k^2-2n\sum x_k^2(\sum x_k)^2+2n\sum x_k^2(\sum y_k)^2 \\ +n^2(\sum y_k^2)^2+2n\sum y_k^2(\sum x_k)^2-2n\sum y_k^2(\sum y_k)^2+ 4n^2(\sum x_k y_k)^2 \\ -8n\sum x_k y_k\sum x_k\sum y_k+(\sum x_k)^4+2(\sum x_k)^2(\sum y_k)^2+ (\sum y_k)^4\right)\right)/\left(2(n\sum x_k y_k-\sum x_k\sum y_k)\right)$

A little simplification with
$\displaystyle n\sum x_k^2-\left (\sum x_k\right )^2=u$
$\displaystyle n\sum y_k^2-\left (\sum y_k\right )^2=v$
$\displaystyle n\sum x_k y_k-\sum x_k\cdot\sum y_k=w$
$v-u=p$,   $2w=q$   results

$m_1=\displaystyle \frac{\sqrt{p^2+q^2}+p}{q}\,\mathrm{,}\,m_2=\frac{-\sqrt{p^2+q^2}+p}{q}$   what looks much less daunting than before.
To test if minimum or not I set $m_1$ in 2nd derivative to $m$ of eq. $IV$

$\displaystyle \frac{4\left(\sqrt{p^2+q^2}+p\right)\left(p^2+q^2\right)q^4}{n\left(\left(\sqrt{p^2+q^2}+p\right)^2+q^2\right)^3}$

Since $n$ is an ordinal number, only sign of $\displaystyle\sqrt{p^2+q^2}+p$ is relevant. Thus $m_1$ is the sought-for result.

In case you are interested in few more details and maybe some routines for pocket calculators (and "emulated" remakes thereof), only then you may find here a paper I did some time ago.