Here is one approach to understanding Newton's method that generalizes easily to your situation. Let $n(x)=x-f(x)/f'(x)$. Newton's sequence approximating a root is defined recursively by $x_{k+1}=x_k-f(x_k)/f'(x_k)$ or $x_{k+1}=n(x_k)$. Thus, if $c$ is a root of $f$, we are interested in the difference $n(x)-c$, when $x$ is close to $c$. This can be estimated using a series expansion of $n$ about $c$:
$$n(x) \approx c+\frac{f''\left(c\right)}{2
f'\left(c\right)}\left(x-c\right){}^2+O\left(\left(x-c\right){}^3\right).$$
From here, it's pretty easy to see that the difference between $n(x)$ and $c$ is proportional to $(x-c)^2$, i.e. you expect quadratic convergence of the sequence defined recursively by $x_k=n(x_{k-1})$.
Now, try the same thing with $n(x)=x-f(x)/f'(x_0)$, where $x_0$ is the fixed first term in your sequence. Again, expanding $n$ about $c$, where $f(c)=0$, we get
$$n(x) \approx c+\left(1-\frac{f'(c)}{f'\left(x_0\right)}\right)(x-c) - \frac{f''(c)}{2 f'\left(x_0\right)}(x-c)^2 +O\left((x-c)^3\right).$$
Quadratic convergence is now lost due to the first order term.
We can use this series to help formulate some concrete examples. The critical issue is the absolute value of $1-f'(c)/f'(x_0)$:
- If $|1-f'(c)/f'(x_0)|>1$, we have divergence,
- If $0<|1-f'(c)/f'(x_0)|<1$, we have linear convergence,
- If $1-f'(c)/f'(x_0) = 0$, we have quadratic convergence.
Of course, those statements all assume that $x_0$ is sufficiently close to $c$.
Now, consider examples of the form $f(x)=x^2-c^2$, which has a root at $x=c$. Then, our series expansion becomes
$$n(x) \approx c + \left(1-\frac{c}{x_0}\right)(x-c) + \frac{1}{2x_0}(x-c)^2+
O\left((x-c)^3\right).$$
Comment: In fact, the second order approximation is exact for this family of functions, but that's not necessary for this approximation technique to work.
It's now very easy to produce specific types of behavior in this family. Whenever, $0<c<x_0$, for example, we have $0<1-c/x_0<1$ so we are guaranteed linear convergence. Even more specifically, if $c=2$ and $x_0=4$, then $1-c/x_0 = 1/2$ and this modified method generates a sequence whose difference from the root $c=2$ is cut about in half with each iterate. On the other hand, if $c=4$ and $x_0=1$, then $1-c/x_0=-3$ and we'll generate a divergent sequence.
Finally, consider $f(x)=2 x - x^3 + x^5$. Note that $c=0$ is a root of $f$ and that $f'(c)=2$. Furthermore, $f'(\sqrt{3/5})=2$. Thus, if we start this modified Newton's method at $x_0=\sqrt{3/5}$, then we might expect quadratic convergence. In fact, we get even better as $n(x) = (x^3-x^5)/2$ and any $x_0$ in $[0,1]$ leads to a sequence with cubic convergence.
By definition, $$g(x)=x-\frac{f(x)}{f'(x)}.$$ So
$$
g'(x)=1-\frac{f'(x)f'(x)-f(x)f''(x)}{f'(x)^2}=\frac{f(x)f''(x)}{f'(x)^2}.
$$
Now $r$ is chosen so that $f(r)=0$, so the numerator above is zero: thus, $g'(r)=0$.
Your second point is totally valid, and indeed $\xi$ will depend on $n$. But if you know that $g''$ is continuous, as all the $\xi$ lie inside a closed interval, $g''$ will be bounded in that interval. So, while it is not necessarily constant, the factor $g''(\xi)$ in the error is bounded by a constant.
For your third question, if $r$ is a "root of multiplicity $\delta$", it means that $$ 0=f(r)=f'(r)=\cdots=f^{(\delta-1)}(r).$$ So all those terms in the Taylor expansion will be gone.
Best Answer
A proof of the quadratic convergence rate for Newton's method in 2 variables may be found in the book Elements of Numerical Analysis by P. Henrici (J. Wiley, 1964).
Also proved is the condition for this to apply to Bairstow's method.