Solved it myself, as below. I don't feel that this solution is "right" somehow, because it treats the interior points $\mathbf B$ and $\mathbf C$ very asymmetrically, so improvements are invited. But, anyway ...
Let's use the notation from the last few paragraphs of the question.
It's clear that $\mathbf P(0) = \mathbf A$, and $\mathbf P(1) = \mathbf D$, so these two points are interpolated, already, and we only have to worry about the other two points, $\mathbf C$ and $\mathbf D$.
First, we find numbers $h$ and $k$ such that
$$\mathbf C = \mathbf A + h(\mathbf B - \mathbf A) + k(\mathbf D - \mathbf A)
= (1-h-k)\mathbf A + h\mathbf B + k\mathbf D$$
This is possible provided that $\mathbf A$, $\mathbf B$, $\mathbf D$ are not collinear. Then, since $\mathbf P(v) = \mathbf C$, we have
$$(1-h-k)\mathbf A + h\mathbf B + k\mathbf D =
(1-v)^2 \mathbf A + 2v(1-v) \mathbf P + v^2 \mathbf D $$
But, since $\mathbf P(u) = \mathbf B$, we know that
$$ \mathbf B = (1-u)^2 \mathbf A + 2u(1-u) \mathbf P + u^2 \mathbf D$$
Substituting for $\mathbf B$ on the left-hand side, and equating coefficients of $\mathbf A$, $\mathbf P$, $\mathbf D$ gives
$$(1-v)^2 = 1 - h - k +h(1-u)^2 $$
$$2v(1-v) = 2hu(1-u)$$
$$v^2 = hu^2 + k $$
We can easily eliminate $v$ from these last three equations using the fact that $[2v(1-v)]^2 = 4[v^2][(1-v)^2]$. We get:
$$[2hu(1-u)]^2 = 4[hu^2 + k][1 - h - k +h(1-u)^2]$$
After a little algebra, this reduces to:
$$h(1-h)u^2 - 2hku + k(1-k) = 0$$
So, we solve this quadratic for $u$, and then get the unknown interior control point $\mathbf P$ from
$$ \mathbf P = \frac{\mathbf B - (1-u)^2 \mathbf A -u^2 \mathbf D}{2u(1-u)}$$
The number of solutions depends on the number of real solutions of the quadratic. Quite often, you can draw two different quadratic Bezier curves through the four given points, as explained in the paper cited in the question.
Yes, this has plenty to do with the derivative. In particular, what you describe is the backwards difference operator, which is just defined as
$$\nabla f(n)=f(n)-f(n-1).$$
This is an operator of interest on its own, but the connection to calculus is that we can consider this as telling us the "average" slope between $n-1$ and $n$.
What you are doing is iterating the operator. In particular, one often writes
$$\nabla^{k+1} f(n)=\nabla^k f(n)-\nabla^k f(n-1)$$
to meant that $\nabla^k f(n)$ is the result of applying this operator $k$ times. For instance, one has that $\nabla^3 n^3 = 6$, as you note. More generally $\nabla^k n^k = k!$, and this lets us recover a polynomial function from its table, which is what you were up to in sixth grade.
However, we can take things further by trying to interpret these numbers - and there is a natural interpretation. For instance, $\nabla^2 f(n)$ represents how quickly $f$ is "accelerating" over the interval $[n-2,n]$, since it tells us about how the average slope changes between the interval $[n-2,n-1]$ and the interval $[n-1,n]$. If we keep going, we get that $\nabla^3 f(n)$ tells us how the acceleration changes between an interval $[n-3,n-2]$ and $[n-2,n]$. We can keep going like this for physical interpretations.
However, this operator has a problem: We'd like to interpret the values as accelerations or as slopes, but $\nabla^k f(n)$ depends on the values of $f$ across the interval $[n-k,n]$. That is, it keeps taking up information from further and further away from the point of interest. The way one fixes this is to try to measure the slope over a smaller distance $h$ rather than measure it over a length of $1$:
$$\nabla_h f(n)=\frac{f(n)-f(n-h)}h$$
which is now the average slope of $f$ between $n-h$ and $n$. So, if we make $h$ smaller, we start to need to know $f$ across a smaller range. This gives better meanings to higher order differences like $\nabla_h^k f(n)$, since now they only depend on a small portion of $f$.
The derivative is just what happens to $\nabla_h$ when you send $h$ to $0$. It captures only local information about the function - so, it captures instantaneous slope or instantaneous acceleration and so on. In particular, one can work out that $\nabla f(n)$ is just the average of the derivative over the interval $[n-1,n]$. One can also work out that $\nabla^2 f(n)$ is a weighted average* of the second derivative over the interval $[n-2,n]$ and $\nabla^3 f(n)$ is another weighted average of the third derivative over $[n-3,n]$.
In particular, if the $k^{th}$ derivative is constant, then it coincides with $\nabla^k f(n)$. One can also find results that if the $k^{th}$ derivative is linear, then $\nabla^k f(n)$ differs from it by at worst a constant. In particular, $\nabla$ is good at capture "global" effects (like the highest order term in a polynomial and its coefficient) but bad at capturing "local" effects (like instantaneous changes in the slope). So, in some sense, $\nabla$ is just a rough approximation of the derivative, and has similar interpretations, just doesn't work nearly as cleanly.
(*Unfortunately, "weighted average" here is hard to explain rigorously without calculus. For the benefit of readers with more background, I really mean "convolution" assuming that $f$ is actually differentiable enough times for any of this to make sense)
Best Answer
The other answers are good. I thought I would include these pictures from Wikipedia because, while you can visualize a function as a graph on $x,y$ axes, I also like visualizing functions as objects connected by arrows.
Here we have an example of a function that turns $x$'s into $y$'s just by traveling along the arrows.
This is a perfectly good function because there is no ambiguity. We can see that $f(1)$ gives us $D$, $f(2)$ gives us $C$, and $f(3)$ gives us $C$ as well.
This next example is not a function, because one of the inputs ($2$) has more than one output.
We can see that $f(1)$ is $D$, but what about $f(2)$? We can't decide what it is because there is more than one output. So this is not a function. Rather, it is a relation.
(In case you're disturbed that the outputs are letters and not numbers... functions can connect any two collections of "things". These things don't have to be numbers, but they often are.)