This question would be better suited for MO, IMHO.
Here is a quick and dirty way to do it at the engineering level of rigor. There are two questions here: one is how to compute the integral with decent precision and another one is how to justify that we are not talking nonsense from the very beginning (It is obvious that $f\in L^2$ but there seems to be no easy way to show that it is in $L^1$).
Let's write the equation as $f+Tf=\varphi$ where $(Tf)(u)=\int_0^\infty K(u-v)f(v)\,dv$, $K(t)=\frac 2\pi\frac 1{4+t^2}$ (so $\int_\mathbb R K=1$), and $\varphi(t)=\frac 2\pi \frac 1{1+t^2}$.
The first observation is that $T$ is self-adjoint ($K$ is even) positive definite $\widehat K>0$ operator of norm $\le 1$ ($\|K\|_{L^1}=1$) in $L^2=L^2(0,+\infty)$. Thus its spectrum lies on $[0,1]$, so $I+T$ is, indeed, invertible, and $\|(I+T)^{-1}\|_{L^2\to L^2}\le 1$.
We shall now solve the auxiliary equation $g+Tg=1$. That seems to make little sense from the $L^2$ point of view, but if we write $g=\frac 12+h$, then it is equivalent to $h+Th=\frac 12(1-T1)=\psi$ and $\psi$ is now in $L^2$ (it is $\psi(u)=\frac 12\int_u^{\infty}K(v)\,dv$, so it decays like $1/u$ at infinity.
Thus, we can write
$$
\langle f,1\rangle=\langle f, g+Tg\rangle=
\langle f+Tf,g \rangle=\langle \varphi, g\rangle
\\
=
\frac 12\langle\varphi,1\rangle+\langle \varphi,h\rangle
=\frac 12+\langle \varphi,h\rangle
$$
and since $\varphi\in L^2$, it suffices to know $h$ with high precision in $L^2$ to find $\langle f, 1\rangle$ with high precision regardless of how large $\|f\|_{L^1}$ is as long as it is finite.
Now we are just solving $(I+T)h=\psi$. We want to approximate $(I+T)^{-1}$ by a polynomial $P(T)$ of low degree. The $L^2\to L^2$ norm of the difference will be just $E(P)=\max_{t\in[0,1]}\left|P(t)-\frac 1{1+t}\right|$. What you tried to do was to take $P(t)=1-t+t^2-\dots\pm t^n$ which makes as much sense as trying to compute $\pi$ directly from the Leibniz formula with the alternating harmonic series. You can play a bit yourself with finding the best polynomial approximation, but just using the interpolation polynomial of degree $4$ with Chebyshev nodes gives you the trivial guaranteed bound of $\frac 1{256}$. Thus, using that approximation, you'll be off by at most $\frac 1{256}\|\psi\|_{L^2}\|\varphi\|_{L^2}$.
Since $\|\varphi\|_{L^2}=\frac 2\pi\sqrt{\frac\pi 4}=\frac{1}{\sqrt{\pi}}$
and
$$
\psi(u)=\frac 1\pi\int_u^\infty\frac{dv}{4+v^2}\le\frac 1\pi\int_u^{\infty}\frac{2\,dv}{(2+v)^2}=\frac 2\pi\frac{1}{2+u}\,
$$
we get $\|\psi\|_{L^2}^2\le \frac 2{\pi^2}$, so
$\|\psi\|_{L^2}\|\varphi\|_{L^2}\le \sqrt{\frac 2{\pi^3}}$ yielding the garanteed upper bound for the error of $0.001$ plus all the errors you accumulate when computing $T$ and $T^2$ of various functions numerically (there is no real need to try to compute $T^4$ because $\langle T^4 \alpha, \beta\rangle=\langle T^2\alpha, T^2\beta\rangle$.
Now comes the second issue: the functions involved are all decaying slowly, so to find a decent numeric integration scheme is not so simple. I hope Farina's book has something useful to say about that, but if not, then we'll have to discuss this point too, though at the moment I do not have any much brighter idea than using Simpson on a reasonably fine lattice stretching far enough. I'll try some numerics to see what makes sense and what doesn't but it will take some time :-)
Hope this helps a bit.
Edit Here is the missing piece: the proof that the integral, indeed, exists. We shall show that $I+T$ is invertible in $L^2$ with the weight $\alpha(u)=\beta(u)^2$ where $\beta(u)=1+cu^p$ with any $\frac 12<p< 1$ and sufficiently small positive $c<c(p)$. To this end, we just need to show that the self-adjoint part $S$ of $T$ in that space satisfies $\delta I+S\ge 0$ with some $\delta<1$. It is equivalent to showing that the self-adjoint operator
$$
Sf(u)=\int_0^\infty \widetilde K(u,v)f(v)\,dv
$$
with
$$
\widetilde K(u,v)=\frac 12\left[\frac{\beta(u)}{\beta(v)}
\frac{\beta(v)}{\beta(u)}\right] K(u,v)
$$
is almost positive definite in the usual non-weighted $L^2$. We shall just compare it with $K$ and write the difference as the integral operator with the kernel
$$
\widetilde K(u,v)=\frac 12\left[\frac{\beta(u)}{\beta(v)}-
\frac{\beta(v)^2}{\beta(u)^2}\right]^2 K(u,v)
$$
All we want is to estimate the norm of this operator by a small constant when $c$ is small. Invoking the Schur test with the function $\psi(u)=\frac{1}{\beta(u)}$, we see that it suffices to show that
$$
\frac 12\int_0^{\infty} \frac{\beta(u)}{\beta(v)}\frac 12\left[\frac{\beta(u)}{\beta(v)}+
\frac{\beta(v)}{\beta(u)}-2\right] K(u,v)\,dv
$$
is small uniformly in $u>0$.
Note that $\frac 12(Z+Z^{-1}-2)\le\max(Z,Z^{-1})-1$, so we need just to estimate
$$
I_1=\int_u^{\infty} \frac{\beta(u)}{\beta(v)}\frac 12\left[
\frac{\beta(v)}{\beta(u)}-1\right] K(u,v)\,dv
$$
and
$$
I_2=\int_0^{u} \frac{\beta(u)}{\beta(v)}\frac 12\left[
\frac{\beta(u)}{\beta(v)}-1\right] K(u,v)\,dv\,.
$$
The estimate of $I_1$ is easy: just note that $\frac{\beta(x)}{\beta(x+y)}$ is an increasing function of $x$ for fixed $y>0$, so
$$
I_1\le\int_u^\infty\left(1-\frac{\beta(0)}{\beta(v-u)}\right)K(u,v)\,dv
=\frac 2\pi\int_0^\infty\left(1-\frac 1{1+ct^p}\right)\frac{1}{4+t^2}\,dt
$$
and the dominated convergence theorem implies that this bound tends to $0$ as $c\to 0$.
$I_2$ is trickier. We'll split it further into $I_2'+I_2''=\int_0^{u/2}+\int_{u/2}^u$.
For $I_2''$, we have $\frac{\beta(u)}{\beta(v)}\le 2^p$, so
$$
I_2'\le 4^p\int_{u/2}^u\left(1-\frac{\beta(v)}{\beta(u)}\right)K(u,v)\,dv\le
4^p\int_{u/2}^u\left(1-\frac{\beta(0)}{\beta(u-v)}\right)K(u,v)\,dv
$$
and we finish just as for $I_1$.
For $I_2'$, we have $K(u,v)\le \frac 1{4+\frac{u^2}4}$. Now we first notice that $\beta(u)/\beta(v)$ is increasing with $c$. However, for $c=1$, we have
$$
I_2'\le\frac{(1+u^p)^2}{{4+\frac{u^2}4}}\int_0^\infty \frac{dv}{(1+v^p)^2}
$$
and since the integral converges and $2p<2$, we conclude that $I_2'$ is small for sufficiently large $u>u_0$ where $u_0$ does not depend on $c$. But for $0<v<u<u_0$ the integrand tends to $0$ uniformly as $c\to 0$, so we are done.
The challenge now becomes to figure out if $f$ is, indeed, non-negative.
Best Answer
Quoting from Wikipedia: In mathematics, an elementary function is a function of a single variable (typically real or complex) that is defined as taking sums, products, and compositions of finitely many polynomial, rational, trigonometric, hyperbolic, and exponential functions, including possibly their inverse functions (e.g., $\arcsin$, $\log$, or $x^{1/n}$).
As you can see, the exponential function is on the list. Now, the modified Bessel function $I_{0}$ is a non-elementary function. You can express it in terms of elementary functions but such an expression will always involve an infinite process, e.g., infinite series, integration, etc. It may be hard to accept that adding an extra factorial to the sum changes things so dramatically, but that is the case here. You have to accept that what you are asking for is not possible, not because mathematicians are incompetent but because of the nature of mathematics.
It was shown by Hankel that $$ I_0 (2\sqrt x ) \sim\frac{{\mathrm{e}^{2\sqrt x } }}{{2\sqrt{\pi} x^{1/4} }}\left( {1 + \frac{1}{{16\sqrt x }} + \frac{9}{{512x}} + \cdots } \right), $$ as $x\to +\infty$. This is a divergent asymptotic expansion in the sense of Poincaré. For the full expansion, see $(10.40.1)$ in the NIST DLMF. The series is divergent for all $x>0$ but any truncated version of it yields a good approximation for $I_0$ for large positive values of $x$. This asymptotics also shows you how the function deviates from $\mathrm{e}^x$.
When $x<0$, your sum can be written in terms of the $J_0$ Bessel function as $J_0 (2\sqrt { - x} )$. The behaviour for large negative $x$ is given by another result of Hankel and it shows an oscillatory behaviour. At leading order $$ J_0 (2\sqrt { - x} ) \approx \frac{1}{{\sqrt \pi ( - x)^{1/4} }}\cos \left( {2\sqrt { - x} - \frac{\pi }{4}} \right), $$ when $x$ is large and negative.
The above two asymptotics shows that the behaviour of your series is rather different when $x>0$ or $x<0$ and may also convince you that it is not possible to capture two such different behaviours by a single approximation.