[Math] Rounding unit vs Machine precision

floating pointmachine precisionnumerical methodsrounding errorrounding-unit

I'm not sure if this question should be asked here…

For a general floating point system defined using the tuple $(\beta, t, L, U)$, where $\beta$ is the base, $t$ is the number of bits in the mantissa, $L$ is the lower bound for the exponent and $U$ is similarly the upper bound for the exponent, the rounding unit is defined as $$r = \frac{1}{2}\beta^{1 – t}$$

If I try to calculate the rounding unit for a single precision IEEE floating-point number which has 24 bits (23 explicit and 1 implicit), I obtain:

$$r = \frac{1}{2}2^{1 – 24} = \frac{1}{2}2^{-23} = 2^{-24}$$

which happens to be (using Matlab)

$$5.960464477539062 * 10^{-8}$$

which seems to be half of

eps('single')

that is, the machine precision for single-precision floating-point numbers for Matlab. The machine precision should be the distance from one floating-point number to another, from my understanding.

If I do the same thing for double-precision, apparently the rounding unit happens to be half of the machine precision, which is the follows

eps = 2.220446049250313e-16

Why is that?

What's the relation between machine precision and rounding unit?

I think I understood what's the rounding unit: it basically should allow us, given a real number $x$, we know that $fl(x)$ (the floating point representation of $x$) is no more far away than this unit to the actual $x$, correct?

But then what's this machine precision or epsilon?

Edit

If you look at the table in this Wikipedia article, there are two columns with the name "machine epsilon", where the values of the entries of one column seem to be half (rounding unit) of the values of the respective entries in the other column (machine precision).

https://en.wikipedia.org/wiki/Machine_epsilon

Best Answer

To illustrate what happens, I constructed a toy example with $\beta=2$ and $t=2$ and plotted the relative error (due to rounding) $$ \frac{|fl(x)-x|}{x}=\frac{|2^e\times1.ab-x|}{x} $$ where $e=\lfloor \log_2 x\rfloor$ and $1.ab$ is a binary number representing $x/2^e$ rounded off to two binary digits.

Here is the graph of $fl(x)$ along with the horizontal line $y=\tfrac12\times 2^{-2}=0.125$:

We can see that the relative rounding error is maximal just a little after $0.5,1$ and $2$. In fact the pattern repeats since each time the exponent $e$ is shifted by one, the number $x/2^e$ runs through the exact same values $[1,2)$ again.

The maximal relative error can be found exactly where $x=2^e\times (1+\tfrac12 \times 2^{-2})$ or simplified $x=2^e\times 1.125$. In base $2$ the number $1.125$ is $1.001_2$ and is being rounded up to $1.01_2=1.25$. Therefore $$ \frac{|fl(1.125)-1.125|}{x}=\frac{|2^0\times1.25-1.125|}{1.125}=\frac{0.125}{1.125} $$ which is just slightly less than $0.125$.

So why is this rounding unit (upper bound on the relative error) only half the size of the machine precision? Well, you can actually see it on the graph between $1$ and $2$: the distance between the valleys (zero error/full precision) is $2^{-2}=0.25$ which is rougly twice the height of the maximal error top.

Best Answer

Related Solutions

[Math] Show that floating point $\sqrt{x \cdot x} \geq x$ for all long $x$.

[Math] How do 24 significant bits give from 6 to 9 significant decimal digits

Related Question