The usual method is slow for double roots because of the following Taylor series argument. Assume $f$ is twice continuously differentiable and $r$ is a single root, i.e. $f(r)=0$ and $f'(r) \neq 0$. Then
\begin{eqnarray*}\left ( x - \frac{f(x)}{f'(x)} \right ) - r & = & x - r - \left ( x - r + \frac{f''(r)}{2 f'(r)} (x-r)^2 + o((x-r)^2) \right ) \\
& = & \frac{f''(r)}{2 f'(r)} (x-r)^2 + o((x-r)^2)
\end{eqnarray*}
What I did here was Taylor expand $\frac{f(x)}{f'(x)}$ to second order about $x=r$, and then substitute in the fact that $f(r)=0$, which cancels a lot of terms. The result means that when $r$ is a single root, the method converges quadratically. Roughly speaking this means that the number of correct digits double at each step, once the error is small enough. On the other hand, if it is a double root (i.e. $f(r)=f'(r)=0$ and $f''(r) \neq 0$), we have a different situation. Here we cannot expand $f(x)/f'(x)$ about $x=r$ because this function is not defined there. Instead we must do the following:
\begin{eqnarray*}\left ( x - \frac{f(x)}{f'(x)} \right ) - r & = & x-r - \frac{1/2 f''(r) (x-r)^2 + o((x-r)^2)}{f''(r)(x-r) + o(x-r)} \\
& = & x-r -\frac{1/2 (x-r) + o(x-r)}{1 + o(1)} \\
& = & x-r - (1/2 (x-r) + o(x-r))(1+o(1)) \\
& = & 1/2 (x-r) + o(x-r)
\end{eqnarray*}
This means the method converges linearly with a coefficient of $1/2$. This means the error is approximately halved at each step, once the error is small enough, which basically means that you gain a correct binary digit at each step. This gets even worse for higher order roots: for a root of order $n$ we have a coefficient of $1-\frac{1}{n}$.
There is a modified Newton method which can detect proximity to a non-simple root and modify $f$ in such a way that quadratic convergence is recovered. If you know the order of the root is $n$, then you can use
$$x_{k+1} = x_k - n \frac{f(x_k)}{f'(x_k)}.$$
A similar argument to the first one shows that this converges quadratically. If you don't know the order of the root, there is a method which can estimate it: see http://mathfaculty.fullerton.edu/mathews/n2003/NewtonAccelerateMod.html
Edit: the symbol $o(f(x))$, called "little oh notation", is a standard shorthand. It means an unspecified function $g(x)$ such that $g(x)/f(x) \to 0$ as $x$ tends to something specified by context (usually $0$ or $\infty$, in this case $0$). So for example, calculus tells us that
$$\lim_{h \to 0} \frac{f(x+h) - f(x) - h f'(x)}{h} = 0$$
which is the same as
$$f(x+h) - f(x) - h f'(x) = o(h)$$
Little oh notation has a weaker counterpart, called "Big oh notation". That is, the symbol $O(f(x))$ means an unspecified function $g(x)$ such that $g(x)/f(x)$ is bounded as $x$ tends to something specified by context.
Best Answer
Let me try my best to answer your question. First, I am going to lay down some math FACTs / THMs / DEFs before I start the proof. I attached Wikipedia links to each FACT / THM / DEF. After this, I will complete the proof.
FACTs / THMs / DEFs:
FACT 1: The real numbers is a complete metric space under the usual absolute value. The definition of complete metric space is defined here: https://en.wikipedia.org/wiki/Complete_metric_space.
THM 1: The Banach Fixed Point Theorem. The theorem can be found here: https://en.wikipedia.org/wiki/Banach_fixed-point_theorem
DEF 1: A Contraction Mapping. The definition can be found here: https://en.wikipedia.org/wiki/Contraction_mapping
PROOF:
Step 1:
Let $$T:X\rightarrow X$$ $$x_{n+1}=T(x_n)=x_n-\frac{f(x_n)}{f'(x_n)}.$$
Our primary goal is to find conditions on $f$ such that the Banach-Fixed-Point THM (THM 1) is true. If THM 1 is true, $\forall x_0 \in X, \exists x^* \in X : \lim_{n\rightarrow\infty}x_n=x^*$ i.o.w. the NR-Method is guaranteed to converge.
Step 2:
First, THM 1 requires that $T$ be a contraction mapping (DEF 1). For this to be true we need restrict the domain $X$ to focus on only one fixed point. If we do not do this, $T$ is not technically a contraction mapping due to a contradiction (See THM 1's Wiki proof last line). Second, we need $X$ to be a complete metric space. This is true by FACT 1: $X\subset \mathbb{R}$ where the metric is $|\cdot|$.
With $X$ satisfying the conditions discussed above, lets assume $T$ is a contraction mapping. Then $$\exists q \in [0,1) : \forall x_2, x_1 \in X, |T(x_2)-T(x_1)| \leq q|x_2-x_1|.$$
Step 3:
Reducing this and taking the limit $x_2-x_1\rightarrow 0$ we find $$|T'(x)|=\lim_{x_2-x_1\rightarrow 0}\left|\frac{T(x_2)-T(x_1)}{x_2-x_1}\right|<\lim_{x_2-x_1\rightarrow 0}1 = 1, \forall x \in X$$ $$|T'(x)| < 1, \forall x \in X.$$
Step 4:
Of course $T$ must be differentiable, but we will see that is equivalent to saying $f(x)$ is twice differentiable. Using standard differentiation one can show $$T'(x) = \frac{f(x)f''(x)}{f'(x)^2}, \forall x \in X.$$ Finally, we have $$|f(x)f''(x)| < |f'(x)^2|, \forall x \in X.$$ I.o.w. we have shown $|f(x)f''(x)| < |f'(x)^2|, \forall x \in X$ $\implies$ $T$ is a contraction mapping on $X$ $\implies$ $\forall x_0 \in X, \exists x^* \in X : \lim_{n\rightarrow\infty}x_n=x^*$ $\implies$ NR-Method converges.