Your computation is correct:
$\Delta_2 = (4X_1 X_0 + 2 X_1 + 1) \delta + (2X_1 + (2X_0 + 1)^2)\delta^2 + (4X_0+2)\delta^3+\delta^4$
Their computation for the induction is correct:
$
A_{n+1} = 2 X_n A_n + 1 \\
B_{n+1} = 2 X_n B_n + A_n^2 \\
C_{n+1} = 2 X_n C_n + 2 A_n B_n
$
Initialise $A_0 = 1, B_0 = 0, C_0 = 0$ and use their computation for the induction.
First answer: "the $\delta^3$ term remains significantly smaller than the $\delta^2$ term" means that $\left|C_n \delta^3\right| << \left|B_n \delta^2\right|$, usually a few orders of magnitude is a good amount (factor of $10^3$ or so). What the final $n$ will be for this stage depends on both the $X_0$ and the largest $\delta$ for the pixels in the image.
Often it will be obvious how many per-pixel iterations it is safe to "skip" by this series approximation technique, the $\left|C_n\right|$ will suddenly increase, but sometimes you get more subtle image distortion first - a common technique is to use regular perturbed iterations for some "probe points" in the image and stop the series approximation iterations once they deviate too much.
The final $n$ in the series approximation step is usually significantly smaller than the minimum iteration count for the first escaping pixel in the view. So you combine it with perturbed iterations for the remaining count.
Second answer: I'm not sure what you mean by seed. Usually it works like this:
- pick a reference point $X_0$ and some probe points $\Delta_0$ in the image
- while the series approxmation is accurate, determined by the worst relative error among all the probe points $|e| << 1$
- step the high precision reference one iteration
using $$X_{n+1} = X_n^2 + X_0$$
- step the series approximation coefficients one iteration
using $$\begin{aligned}A_{n+1} &= 2 X_n A_n + 1 \\ B_{n+1} &= 2 X_n B_n + A_n^2 \\ C_{n+1} &= 2 X_n C_n + 2 A_n B_n \end{aligned}$$
- step the probe points one iteration
using $$\Delta_{n+1} = 2 X_n \Delta_n + \Delta_n^2 + \Delta_0$$
- initialize all the image points from their $\Delta_0$ with last good series approximation coefficients using $$\Delta_n = A_n \Delta_0 + B_n \Delta_0^2 + C_n \Delta_0^3$$
- step the reference $X_n$ until it escapes or maximum iteration count is reached using $$X_{n+1} = X_n^2 + X_0$$
- step all image points $\Delta_n$ using stored reference iterations $X_n$ using $$\Delta_{n+1} = 2 X_n \Delta_n + \Delta_n^2 + \Delta_0$$ If you detect "glitches" in the perturbed iterations by $|X_n + \Delta_n| << |X_n|$, set these pixels aside
- if there are glitched pixels remaining (or the reference escaped too early), repeat this whole process for the glitched pixels (pick a different $X_0$)
"Glitches" occur when the dynamics of a pixel are too different from the dynamics of the reference. They can (usually) be detected by Pauldelbrot's criterion1 $\left|z_n+\delta_n\right| << \left|z_n\right|$ and corrected by picking a better reference.
This is all a bit adhoc and seems to work well, but rigourous numerical analysis proofs are still lacking as far as I know. There are two arbitrary thresholds (for series vs probe point relative error tolerance, and for glitch detection) which is very unsatisfactory.
Asymptotics
Suppose $M$ iterations are skipped by series approximation, and $N$ iterations are needed in total. The image size is $W \times H$. Suppose the cost of high precision operations is $K$ times the cost of low precision operations. Then the cost of the traditional method using high precision for all pixels is $O(K \times N \times W \times H)$. With perturbation, the cost is $O(K \times N + N \times W \times H)$. With perturbation and series approximation the cost is $O(K \times N + (N - M) \times W \times H)$.
Best Answer
As far as I know, the perturbation techniques when applied to deep zooming have only been shown to work reliably for "Mandelbrot" iterations of the form $z := F(z) + c$ where $c$ is the pixel coordinates. I don't know if they can be expected to work for "Julia" iterations where it is the initial $z$ that is the pixel coordinates and $c$ is a constant.
Your first bullet point is fine, that's generally how it is done.
Your second bullet point is mostly ok, for Mandelbrot type iterations you have per pixel $\delta = c_\text{pixel} - c_\text{ref}$, for Julia sets I suppose you replace $c$ with $z_0$. Re-deriving the update formulae for $A_n,B_n,C_n$ because they will be different to the Mandelbrot case:
$$\begin{aligned} Z_{n+1} &= Z_n^2 + C \\ Z_{n+1}+z_{n+1} &:= (Z_n+z_n)^2 + C \\ z_{n+1} &:= 2 Z_n z_n + z_n^2 \\ \delta &= z_0 \\ z_n &= A_n \delta + B_n \delta^2 + C_n \delta^3 \\ \end{aligned}$$ Substituting the last line into the third line and equating coefficients of $\delta^k$ gives $$\begin{aligned} A_0 &= 1 & A_{n+1} &= 2 Z_n A_n \\ B_0 &= 0 & B_{n+1} &= 2 Z_n B_n + A_n^2 \\ C_0 &= 0 & C_{n+1} &= 2 Z_n C_n + 2 A_n B_n\\ \end{aligned}$$ which you can see breaks down if ever $Z_n = 0$.
Your third bullet point is very problematic - the series is only approximate, and beyond a certain iteration count (dependent on location) it goes wild and all bets are off. You can use probe points on the boundary of the image (iterated in another way, and compared with the approximation) to detect when the series is starting to diverge from reality. This will typically be significantly before the pixel escapes to infinity.
One issue with fixed point is underflow and ensuing loss of precision. If $\delta < \frac{1}{10}$ then $\delta^3 < \frac{1}{1000} \approx 2^{-10}$ so you have already lost almost half of your precision. On the other side, the coefficients $A_n, B_n, C_n$ get increasingly large, so you will probably overflow your limited range.
One possible solution is to rescale all your numbers, so instead of multiplying big $C_n$ with small $\delta^3$ you take out a constant factor $K$ to make $\epsilon = K \delta$ have magnitude approximately equal to $1$. It becomes:
$$ \frac{A_n}{K} \times K \delta + \frac{B_n}{K^2} \times K^2 \delta^2 + \frac{C_n}{K^3} \times K^3 \delta^3 = a_n \epsilon + b_n \epsilon^2 + c_n \epsilon^3$$ with all the variables on the right hand side having magnitude near $1$, so no risk of overflow or underflow.
It's a matter of careful algebra to work out the update iterations for $a_n, b_n, c_n$, you will need to involve $K$ at certain points when multiplying scaled values, perhaps something like this: $$\frac{X}{K} \times_1 \frac{Y}{K} = \frac{X\times_1Y/_1K}{K} \equiv x \times_K y = x\times_1y/_1K$$
I suggest instrumenting your program to tell you loudly when overflow occurs, underflow is harder to deal with I suppose.