[Math] Absolute and relative error of a binary number

binarynumber-systemsnumerical methods

Given is the following system $A:={ \pm 1.a_1 a_2 a_3 a_4 \cdot 2^e}$, $a_{i}\in \left \{ 0,1 \right \}$, $i\in \left \{ 1,2,3,4 \right \}$, $e\in \left \{ -8,\ldots,8 \right \}$. Find the absolute and the relative error of $x\in \left \{ \frac{7}{2}, \frac{7}{2}\cdot 2^9 \right \}$.

Good, the formula for the absolute error is $\left | x-x_{0} \right |$, where $x$ is the correct value of the number and $x_{0}$ is the stored value. Of course, the relative error we can find with the formula $\frac{\left | x-x_{0} \right |}{\left | x \right |}$. My question is how to find $x_{0}$ in the number system above? For examle $\frac{7}{2}$ i can write as $\frac{7}{2}\cdot 2^{0}$, but then $\frac{7}{2}\cdot 2^0 = 7\cdot2^{-1}$, $7$ in the binary code is $111$, so if i write it as $1.11\cdot2$, but $1.11\cdot2=1\cdot 2^{1}\cdot 2^{0}+1\cdot 2^{1}\cdot 2^{-1}+1\cdot 2^{1}\cdot 2^{-2}=\frac{7}{2}$. But doesn't make any sense to me, because then the absolute error will be $0$…
Can anybody help me with this question, please? Thank you in advance!

Best Answer

Binary isn't a code it's a number system, like decimal. You can look up hexadecimal as another common (probably more common than binary) example.

Ignoring the exponent part for a second In your example above though there is a limit to four digits after the binary point (it isn't called a decimal point in binary). In that case the number can only hold up to sixteenths in precision. So given the number $1/32$ represented in binary as $0.00001$ that would have to be rounded up and stored as $1/16$ or $0.0001$. In that case the absolute error is $1/16$. With $7/2$, which is $111/10$ in binary that would come out to $11.1000$, so in that case there is no error.

Now considering the whole thing The exponent on $2$ can go between $-8$ and $+8$ (it is somewhat confusing since all of the numbers provided above are in decimal for some reason...) which is $-1000$ and $+1000$ in binary, in that case $1/32$ can still be stored as $1.0000\cdot2^{-5}$ in decimal or $1.0000\cdot10^{-101}$ in binary, at which point it becomes obvious that this is the binary version of scientific notation. The error being the same as in decimal scientific notation, the digits that are not "significant" are the error.

On a side note I don't understand where those $2^1$s are coming in I would have written $1.11$ as $1\cdot2^0 + 1\cdot2^{-1} + 1\cdot2^{-2}$ similar to how it works in decimal.