I think the simplest thing that would work in your application is to show the user 4 special points on the parametric cubic curve, and allow the user to manipulate those 4 special points.
(Allowing the user to pick any point on the curve, and move it, makes things more complicated).
I think this is the same as what Stephen H. Noskowicz calls "Cubic Four Point" representation, aka the quadratic Lagrange with t1 = 1/3 and t2 = 2/3.
While your user is moving those 4 special points U0, U1, U2, U3 around,
periodically you find a cubic Bezier curve that goes through those 4 points using John Burkardt's approach:
P0 = U0
P1 = (1/6)*( -5*U0 + 18*U1 - 9*U2 + 2*U3 )
P2 = (1/6)*( 2*U0 - 9*U1 +18*U2 - 5*U3 )
P3 = U3.
That gives you the Bezier curve representation of the same cubic curve -- a series of 4 control points.
You then feed those 4 control points (the endpoints P0 and P3, and the intemediate control points P1 and P2) into any Bezier curve plotter.
The resulting curve (usually) doesn't touch P1 or P2, but it will start at X0, go exactly through X1 and X2, and end at X3.
(This uses the special points at t=0, 1/3, 2/3, and 1. It's possible to, instead, use the special points at t=1, 1/4, 3/4, and 1, as shown at How do I find a Bezier curve that goes through a series of points? . Or, I suppose, any 4 distinct t values. But I suspect the 0, 1/3, 2/3, 1 values are used most often, and I don't see any advantage to using any other fixed values).
There's a good explanation starting on this web page.
I recommend that you read the three sections on Bezier surfaces in "Unit 8".
The process you described in your question is roughly correct.
Let's call the 16 points:
$$P_{11}, P_{12}, P_{13}, P_{14}$$
$$P_{21}, P_{22}, P_{23}, P_{24}$$
$$P_{31}, P_{32}, P_{33}, P_{34}$$
$$P_{41}, P_{42}, P_{43}, P_{44}$$
And let $\phi_1$, $\phi_2$, $\phi_3$, $\phi_4$ be the Bernstein polynomials of degree 3.
Suppose we want to evaluate a point on the surface with parameter values $(s,t) = (0.5, 0.2)$.
The first thing we do is use the 16 points four at-a-time to construct four curves. We start with the points of the first row: $P_{11}$, $P_{12}$, $P_{13}$, $P_{14}$. We can use them to construct a Bezier curve
$$
C_1(s) = \phi_1(s)P_{11} + \phi_2(s)P_{12} + \phi_3(s)P_{13} + \phi_4(s)P_{14} =
\sum_{j=1}^4 \phi_j(s)P_{1,j}
$$
Then, on this curve, we can calculate the point $Q_1 = C_1(0.5)$ at value $s=0.5$.
$$
Q_1 = C_1(0.5) = \sum_{j=1}^4 \phi_j(0.5)P_{1,j}
$$
You can calculate $Q_1$ by the normal de Casteljau algrorithm, or any other method.
We do the same things with the other three rows of points. For each row, we construct a Bezier curve, and then we calculate the point at parameter value $s=0.5$. We get points
$$\text{second row:}\quad Q_2 = C_2(0.5) = \sum_{j=1}^4 \phi_j(0.5)P_{2,j}$$
$$\text{third row:}\quad Q_3 = C_3(0.5) = \sum_{j=1}^4 \phi_j(0.5)P_{3,j}$$
$$\text{fourth row:}\quad Q_4 = C_4(0.5) = \sum_{j=1}^4 \phi_j(0.5)P_{4,j}$$
We can write all this more tidily as:
$$
Q_i = C_i(0.5) = \sum_{j=1}^4 \phi_j(0.5)P_{ij} \quad (i=1,2,3,4)
$$
Now we just do the Bezier thing in the other direction -- we use the points $Q_1$, $Q_2$, $Q_3$, $Q_4$ to construct a Bezier curve Q(t) given by
$$
Q(t) = \phi_1(t)Q_1 + \phi_2(t)Q_2 + \phi_3(t)Q_3 + \phi_4(t)Q_4 =
\sum_{i=1}^4 \phi_i(t)Q_i
$$
Then our final surface point is the point at parameter value $t=0.2$ on the curve $Q(t)$:
$$
X(0.5, 0.2) = Q(0.2) = \sum_{i=1}^4 \phi_i(0.2)Q_i
$$
Again, you can calculate this point using the de Casteljau algorithm or any other method.
To understand how all this relates to the surface equation, let's substitute the expressions for the $Q_i$ into our last equation. We get
$$
X(0.5, 0.2) = \sum_{i=1}^4 \phi_i(0.2)Q_i =
\sum_{i=1}^4 \phi_i(0.2)\left(\sum_{j=1}^4 \phi_j(0.5)P_{ij}\right)
$$
Rearranging this gives:
$$
X(0.5, 0.2) = \sum_{i=1}^4 \sum_{j=1}^4 \phi_j(0.5) \phi_i(0.2) P_{ij}
$$
You should recognize the right-hand side as the formula for a Bezier surface.
Here's a picture:
The blue curves are $C_1$, $C_2$, $C_3$, $C_4$, the red dots are $Q_1$, $Q_2$, $Q_3$, $Q_4$, and the red curve is $Q$.
Best Answer
The value 3 is a consequence of a different method to draw the curve, namely not by converting it to parameter form but rather by recursively refining it. That is, staring from four points $A,B,C,D$ you take successive mid points $E=\frac{A+B}2$, $F=\frac{B+C}2$, $G=\frac{C+D}2$, then midpoints of the midpoints $H=\frac{E+F}2$, $I=\frac{F+G}2$, finally $J=\frac{H+I}2$. Note that taking mid pints is a very simple operation with respect to the coordinates and may require much less precision than drawing the parametric curve.
Now the crucial point is the following: The curve determined by the four points $A,B,C,D$ is replaced by two smaller pieces of curve, the first determined by $A,E,H,J$, the other by $J,I,G,D$. Several nice properties are obeyed by this replacement so that the rough shape of the curve can be "predicted": The curve passes through end points ($A$ and $D$ of the original curve, additionally $H$ for the refined curves), the tangents there point towards $B$ resp. $C$.
Also, if the quadrangle $ABCD$ is convex, then the curve is contained inside it. Note that the refinement steps will sooner or later produce convex quadrangles and each refinement step from then on will give better and better containment estimates for the curve.
Ultimately, it is possible to calculate what the curve described by the above procedure sould look like in parametric form and it turns out that the factor 3 you observed comes into play. In effect, this is the same $3$ as in the binomial formula $(a+b)^3=a^3+3a^2b+3ab^2+b^3$. (Which means that for quadratic Bezier curves you will find a factor of $2$ and for Bezier curves of degree $4$, the numbers $4$ and $6$ will occur etc.