In many cases we can prove that $f'(x)>0$ and $f''(x)\geq 0$ for all $x$ without knowing anything about $a$ and $b.$ Then we may be able to prove there exists $a, b$ with $f(a)<0<f(b)$. In this case you can begin the iteration anywhere.
In practice a less restrictive but sufficient set of conditions is
(i). There exists $c$ with $f(c)=0.$
(ii).There exist $a>c$ (the same $c$ as in (i)) such that $f(a)>0$ and such that $f'(x)>0$ for $c<x\leq a.$
(iii). $f'$ is increasing on $(c,a).$ (That is, $c<x<y<a\implies f'(x)\leq f'(y)$ .)
For a non-linear function $f,$ condition (i), (ii) and (iii) imply that $f(x)>0$ for $c<x\leq a$ and that the sequence $a_{n+1}=a_n-f(a_n)/f'(a_n),$ with $a_1=a,$ is strictly decreasing and remains in $(c,a]$ so it has a limit $d,$ and that $f(d)=0$ (so $d=c$).
Example. Let $f(x)=x^4-10.$ We know there exists $c\in (0,2)$ with $f(c)=0$ because $f(0)<0<f(2)$. We have $f'(x)=4x^3 >0$ for $x>0$, hence $f'(x)>0$ for $x>c.$ We have $f''(x)=12x^2\geq 0$ , hence $f'$ is increasing for $x>c.$ So we can begin an iteration with $a=a_1=2.$
By definition, $$g(x)=x-\frac{f(x)}{f'(x)}.$$ So
$$
g'(x)=1-\frac{f'(x)f'(x)-f(x)f''(x)}{f'(x)^2}=\frac{f(x)f''(x)}{f'(x)^2}.
$$
Now $r$ is chosen so that $f(r)=0$, so the numerator above is zero: thus, $g'(r)=0$.
Your second point is totally valid, and indeed $\xi$ will depend on $n$. But if you know that $g''$ is continuous, as all the $\xi$ lie inside a closed interval, $g''$ will be bounded in that interval. So, while it is not necessarily constant, the factor $g''(\xi)$ in the error is bounded by a constant.
For your third question, if $r$ is a "root of multiplicity $\delta$", it means that $$ 0=f(r)=f'(r)=\cdots=f^{(\delta-1)}(r).$$ So all those terms in the Taylor expansion will be gone.
Best Answer
Following the theory explained in https://math.stackexchange.com/a/1653829/115115, determine over $[0,2]$ $$ m_1=\min_{x\in[0,2]} |f'(x)|=\min_{x\in[0,2]} 3x^2+1=1 $$ and $$ M_2=\max_{x\in[0,2]} |f''(x)|=\max_{x\in[0,2]}6x=12 $$ and determine the "contraction" constant $$ C=\frac{M_2}{2m_1}=6. $$ From $$ |x_{n+1}-L|\le C·|x_n-L|^2=(C·|x_n-L|)·|x_n-L|\\ \implies |x_n-L|\le C^{-1}· (C·|x_0-L|)^{2^n} $$ one sees that the method is contractive and quadratically convergent for $$ |x_0-L|<\frac16. $$
Starting with the smaller interval $[\frac12,\frac32]$ these estimates give $m_1=\frac74$, $M_2=9$, $C=18/7<3$ leading to the greater radius $$|x_0-L|<\frac13$$ for the initial interval of good starting points.