This seems like it should be a simple problem, so maybe I am just being silly.
Let's say that we have $n$ points, $P_1$ to $P_n$, with coordinates $(x_1,y_1, z_1)$ to $(x_n,y_n, z_n)$, floating above the $xy$ plane and we would like to find the point $Q$ on the $xy$ plane such that the sum of the Pythagorean distances between $Q$ and the points $P_1$ to $P_n$ is minimized.
What I have done so far:
Working with the sum of the squared distances instead, we could define $f$ as
$$f(x, y, z)=\sum_{k=1}^n d_k^2=\sum_{k=1}^n\left(x-x_k\right)^2+\sum_{k=1}^n\left(y-y_k\right)^2+\sum_{k=1}^n\left(z-z_k\right)^2$$
where $x$, $y$ and $z$ are potential coordinates for $Q$.
The partial derivative with respect to $x$:
$$\begin{align}
\frac{\partial f}{\partial x}&=2\sum_{k=1}^n\left(x-x_k\right)\\
&=2\left(nx-\sum_{k=1}^n x_k\right)
\end{align}$$
Setting this to zero and solving for $x$ gets us the result
$$x=\frac1n\sum_{k=1}^n x_k=\bar x$$
Similarly the other coordinates are $y=\bar y$ and $z=\bar z$.
This gives the result, ignoring the constraint that $Q$ should lie on the $xy$ plane, that the sum of the square distances will be minimized by $Q(\bar x,\bar y, \bar z)$.
With the constraint that $Q$ lies on the $xy$ plane, however, I thought I could still work with $f$ but just set variable $z$ to zero and replace the last term with the constant $\sum_{k=1}^nz_k^2$. In this case, optimizing as above would yield $(\bar x, \bar y, 0)$.
It seems counter intuitive to me that the heights of the points are negligible, and when I played around with toy data sets, I could see that the result was incorrect.
(From the toy data sets, I suspect that the correct $x$ and $y$ coordinates of $Q$ are the means of the coordinates of $P_1$ to $P_n$, but weighted by the inverse of the respective $z$ coordinate…)
What is wrong with the reasoning above? I'm sure that this problem is a well known problem, so any references would also be helpful!
Best Answer
Thanks to David in the comments for pointing out the brain-fart.
Let me try again:
Let's define $f$ as as the sum of the distances:
$$f(x, y)=\sum_{k=1}^nd_k=\sum_{k=1}^n\sqrt{\left(x-x_k\right)^2+\left(y-y_k\right)^2+z_k^2}$$
Then:
$$\begin{align} \frac{\partial f}{\partial x}&=\sum\frac{x-x_k}{\sqrt{\left(x-x_k\right)^2+\left(y-y_k\right)^2+z_k^2}}\\ &=x\sum\frac{1}{d_k}-\sum\frac{x_k}{d_k} \end{align}$$
Setting this to zero yields the result:
$$x=\frac{\sum\frac{1}{d_k}x_k}{\sum\frac{1}{d_k}}$$
Similarly:
$$y=\frac{\sum\frac{1}{d_k}y_k}{\sum\frac{1}{d_k}}$$
(It looks like I was wrong before when I thought that the point had the mean coordinates weighted by the inverse of the respective $z$ coordinates - it rather looks like the mean weighted by the inverse distances.)
Unfortunately, this is still in terms of $d_k$ which means we have to know the optimum coordinates to calculate the optimum coordinates...
We could use numerical methods to iterate improvements on an initial set of estimates, for example,
It appears to work on the toy data sets I've used, but it's not really the solution I was hoping for...
Update
Just to bring a little closure to this question:
It turns out that this problem is closely related to the geometric median of a set of points, which is the point that minimizes the sum of Euclidean distances to the points. Specifically, the point in this problem has the same $x$ and $y$ coordinates as the geometric median and a zero $z$ coordinate (analogous to a shadow of the geometric median on the $xy$ plane).
It looks like the formula I was after doesn't exist. From the Wikipedia article linked to above:
Further, the algorithm that I outlined above turns out to have a name: Weiszfeld's algorithm, after Endre Weiszfeld.
(Of course, since then that point would have an undefined weight. Practically we could, I think, add a negligible value to zero distances encountered along the way to get over this.)
I wasn't originally intending to answer my own question but I think that this is pretty much the answer I was looking for.