[Math] How to calculate the errors of single and double precision

approximation-theorycomputational mathematicsnumerical methodsordinary differential equations

We consider the initial value problem

$$\left\{\begin{matrix}
y'=y &, 0 \leq t \leq 1 \\
y(0)=1 &
\end{matrix}\right.$$

We apply the Euler method with $h=\frac{1}{N}$ and huge number of steps $N$ in order to calculate the approximation $y^N$ of the value of the solution $y$ at $t^N, \ y(t^N)=y(1)=e$.
At the following table there are, for all $N$, the errors $|\epsilon^N|=|e-y^N|$, when the calculations are done with single and double precision.

$$\begin{matrix}
N & |\epsilon^N|\text{ Single-precision } & |\epsilon^N| \text{ Double-precision } \\
– & – & – \\
100 & 0.13468 \cdot 10^{-1} & 0.13468 \cdot 10^{-1} \\
200 & 0.67661 \cdot 10^{-2} & 0.67647 \cdot 10^{-2}\\
400 & 0.33917 \cdot 10^{-2} & 0.33901 \cdot 10^{-2}\\
800 & 0.16971 \cdot 10^{-2} & 0.16970 \cdot 10^{-2}\\
1600 & 0.85568 \cdot 10^{-3} & 0.84898 \cdot 10^{-3} \\
\cdots & & \\
102400 & 0.65088 \cdot 10^{-4} & 0.13273 \cdot 10^{-4} \\
204800 & 0.21720 \cdot 10^{-3} & 0.66363 \cdot 10^{-5} \\
409600 & 0.78464 \cdot 10^{-3} & 0.33181 \cdot 10^{-5} \\
819200 & 0.20955 \cdot 10^{-2} & 0.16590 \cdot 10^{-5} \\
\dots
\end{matrix}$$

We notice that the errors of the calculations of double-precision get approximately half. However, in the case of single-precision, for $N>10^5$ the errors increase!
Indeed, for a big enough $N$, the errors in our case tend to $1.71828 \dots$.

Could you explain me why the errors, when the calculations are done in single-precision, increase for $N>10^5$ and why they get approximately half when the calculations are done in double-precision?

Also, how can we calculate the error for a given $N$?
For example, if we have $N=10^5$ then $\epsilon^N=|e-y^{10^5}|=\left |e- \left( 1+ \frac{1}{10^5} \right)^{10^5} \right |$.
How can we calculate the latter, knowing that the zero of the machine is $10^{-6}$ when we have single precision but $10^{-12}$ when we have double precision?

EDIT: It holds that: $$\ln{\left( 1+ \frac{1}{N}\right)}=\sum_{n=1}^{\infty} (-1)^{n+1} \frac{\left( \frac{1}{N}\right)^n}{n}=\frac{1}{N}- \frac{1}{2N^2}+O(N^{-3})$$
Right? If so, then $N \ln{\left( 1+ \frac{1}{N}\right)}=1-\frac{1}{2N}+O(N^{-2})$, right?
$$$$
If so, then how can we find the difference of the real solution with the approximation when we take into consideration that we have single precision and how when we take into consideration that we have double precision?

Best Answer

When $N$ gets large, the size of each step, and thus the size of the change in function value, gets small. You start out with $y(0)=1$, and then you get $$ y(h)=1+\epsilon $$ and if $\epsilon$ is small enough, the computer won't be able to handle the difference between $1$ and $1+\epsilon$ with very good precision. This is the source of the increasing error. Since double precision handles this problem better, you get less error.

As for why the error tends to $1.71828\ldots$, if $\epsilon$ is really small, the computer thinks that $y$ doesn't change at all from time step to time step, and therefore thinks that $y$ is a constant function. You're supposed to get $e$ as the final value, so the error is therefore $e-1$.

Related Question