Barbara MacCluer just finished a beginner's text called Elementary Functional Analysis that looks outstanding. It's reasonably short,has minimal prerequisites and covers all the basics. One warning: It assumes fairly thorough knowledge of measure theory,so you're going to need to learn that first. My favorite source for that material is Angus Taylor's General Theory Of Functions And Integration, available for a song from Dover. These 2 books together should fill your needs very nicely. And cheaply,too!
Here's a review of the MacCluer book: http://page.mi.fu-berlin.de/werner99/preprints/maccluer.pdf
This is an answer based on my comments above. There is indeed an integral version of Newton's method for algebraic equation. Say you have an equation:
$$
f(x) = 0,
$$
then we can set up an initial value problem:
$$
\begin{cases}
x'(t) = \frac{\alpha}{1+t^{\beta}}f(x),
\\[3pt]
x(0) = x_0.
\end{cases}
$$
As you can see, the equilibrium solution for above ODE is when $f(x) = 0$, i.e.,
$$
\lim_{t\to \infty} x(t) = r,
$$
where $r$ is one real root of $f(x)$.
In that ODE, $\alpha$ can be positive or negative depends on $f(x_0)$'s sign, and we would like to the solution decays to the equilibrium solution pretty fast, for e.g. choosing $\beta = 1$.
This method has two advantages:
Let's use that infamous $x^3 -2x +2 =0$ for example, if your initial guess is $1$ or $0$ then you will end up with oscillating forever between $1$ and $0$ (Please see the wiki's entry for Newton's method).
Using the ODE approach, set up the following initial value problem with initial guess $0$, and $\alpha = 1, \beta = -1$.
$$
\begin{cases}
x'(t) = -\frac{t}{1+t }(x^3 -2x +2),
\\[3pt]
x(0) = 0.
\end{cases}
$$
Choosing time step $h= 0.05$, and we can see the solution $x(t)$ converges pretty fast to the equilibrium solution $x_e\approx -1.7692923542386314152$, which is the root for $x^3 -2x +2 =0$:
The philosophy behind this is that: No matter integration or differentiation, we just need that contraction in the spaces where the solution lies, and this contraction must be "good", so that we can get a good approximation after a few iterations.
Best Answer
Here's a classic motivating example. Let $H$ be the Heaviside unit step function $$ H(x) = \begin{cases} 1 & x > 0 \\ 0 & x \leq 0 \end{cases}, $$ and suppose we want to solve the differential equation $$u' = H.$$ We attack each part of $H$ separately: \begin{align*} x > 0 &\implies u(x) = x + c_0 &\mbox{ for some } c_0 \in \mathbb R, \\ x \leq 0 &\implies u(x) = c_1 &\mbox{ for some } c_1 \in \mathbb R. \end{align*} Now our differential equation has $u'$ in it; this means that $u$ is differentiable, and hence $u$ is continuous. Continuity of $u$ at $0$ enforces the constraint $c_0 = c_1$. Hence, letting $c := c_0 = c_1$, we get $$ u(x) = \begin{cases} x + c & x > 0 \\ c & x \leq 0 \end{cases}. $$
We've run into the problem: we cannot differentiate $u$ at $0$, as we get $1$ from above and $0$ from below. Hence there is no differentiable function $u$ satisfying $u' = H$. Uh oh.
Let's see the magic that happens if we turn our differential equation into an integral equation by first multiplying by a "nice" test function $\varphi$ and integrating. By nice, we require that we can take derivatives of $\varphi$, and that $\varphi$ vanishes at infinity. Our integral equation is $$ \int_{\mathbb R} \varphi u' = \int_{\mathbb R} \varphi H. $$ Let's see if $u$ works in this equation. Starting with the LHS, and integrating by parts \begin{align*} \varphi u \big|_{-\infty}^\infty - \int_{\mathbb R} u \varphi' &= - \int_{\mathbb R} u \varphi ' \\ &= - \int_{-\infty}^0 c \varphi' - \int_0^\infty (x + c) \varphi' \\ &=- \int_0^\infty x \varphi' \\ &=- x \varphi \big|_{0}^\infty + \int_0^\infty \varphi \\ &= \int_0^\infty \varphi \\ &= \int_{\mathbb R} \varphi H, \end{align*} where we've integrated by parts again and used our "nice" properties of $\varphi$.
The upshot: $u$ did not work for the differential equation, but it did work for the integral equation. This is the point of using integral equations instead: they allow for greater possibility of the existence of solutions because solutions aren't required to have such stringent regularity conditions (like differentiability and continuity). We "shift" those requirements onto the "nice test function" $\varphi$.