I think the simplest thing that would work in your application is to show the user 4 special points on the parametric cubic curve, and allow the user to manipulate those 4 special points.
(Allowing the user to pick any point on the curve, and move it, makes things more complicated).
I think this is the same as what Stephen H. Noskowicz calls "Cubic Four Point" representation, aka the quadratic Lagrange with t1 = 1/3 and t2 = 2/3.
While your user is moving those 4 special points U0, U1, U2, U3 around,
periodically you find a cubic Bezier curve that goes through those 4 points using John Burkardt's approach:
P0 = U0
P1 = (1/6)*( -5*U0 + 18*U1 - 9*U2 + 2*U3 )
P2 = (1/6)*( 2*U0 - 9*U1 +18*U2 - 5*U3 )
P3 = U3.
That gives you the Bezier curve representation of the same cubic curve -- a series of 4 control points.
You then feed those 4 control points (the endpoints P0 and P3, and the intemediate control points P1 and P2) into any Bezier curve plotter.
The resulting curve (usually) doesn't touch P1 or P2, but it will start at X0, go exactly through X1 and X2, and end at X3.
(This uses the special points at t=0, 1/3, 2/3, and 1. It's possible to, instead, use the special points at t=1, 1/4, 3/4, and 1, as shown at How do I find a Bezier curve that goes through a series of points? . Or, I suppose, any 4 distinct t values. But I suspect the 0, 1/3, 2/3, 1 values are used most often, and I don't see any advantage to using any other fixed values).
If you would like to minimize the "sharpness" of the cubic Bezier curve honoring two end points and two end tangent directions, there is something called "optimized geometric Hermite curve" (OGH curve) that might be interested to you. The OGH curve does not minimize the maximum curvature of the curve. Instead, it minimizes the overall "strain energy" of the curve, which is
$$\int_0^1 [f^"(t)]^2 \;\mathrm{dt}$$
You can refer to this paper link for details. For cubic OGH curve, you can find out the "optimal" magnitudes of the end derivatives analytically. The formula is listed in this paper as equation (4).
Best Answer
The article you cited is wrong (or, at best, misleading). In general, the offset of a Bezier curve can not be represented exactly as another Bezier curve (of any degree). But, on the other hand, there are many situations where you don't need an exact offset, you only need a decent approximation. In my view, the definitive works in this area are the following two papers:
Farouki and Neff: Analytic properties of plane offset curves, CAGD 7 (1990), 83-99
Farouki and Neff: Algebraic properties of plane offset curves, CAGD 7 (1990), 101-127
For a good comparison of available approximation techniques, look at this paper: http://www.cs.technion.ac.il/~gershon/papers/offset-compare.pdf
Regarding special cases: Bezier curves that happen to be straight lines can obviously be offset exactly, as you observed. Also, so-called Pythagorean Hodograph curves have offsets that are rational Bezier curves, at least, but not polynomial ones. Ask again if you're interested in these.
The 90 degree idea is not very useful, even as an approximation guideline. As an example, consider the curve that has control points (0,0), (2,1), (0,1), (2,0). It satisfies the given conditions, but it's very difficult to offset accurately.