[Math] Adding two IEEE754 floating-point representations and interpreting the result.

binaryfloating point

This isn't for any class or homework. As part of my personal study, I'm trying to better understand the IEEE754 representation of decimal floating-point numbers in binary. I'd like to add two numbers: $1.111$ and $2.222$, then compare the result by converting the IEEE754 representation of the sum back to decimal.

Per this online tool:

$1.111 = 00111111100011100011010100111111$
$2.222 = 01000000000011100011010100111111$

Summing these two together using signed binary addition, I get:

$0111 1111 1001 1100 0110 1010 0111 1110$

In hexadecimal, this is:

$7F9C6A7E$

And according to this other version of the tool, that corresponds to $NaN$.

What's going on here?

Best Answer

You cannot expect to use integer binary addition on two floating-point representations and get a meaningful result.

First, $1.111$ cannot be represented exactly in binary floating point. Your 00111111100011100011010100111111 is actually the IEEE-754 single precision representation of the number $$ 1.11099994182586669921875 $$ which is the closest representable number to $1.111$. This breaks up as

  0      01111111        00011100011010100111111
sign  biased exponent  fractional part of mantissa

and stands for the number $$ 1.00011100011010100111111_2 \times 2^{127-127} $$

The representation of $2.222$ is twice that, with the same mantissa but the exponent one higher. When we add them we must position the mantissas correctly with respect to each other:

   1.00011100011010100111111
+ 10.0011100011010100111111
----------------------------
= 11.01010101001111110111101
  11.0101010100111111011110   <-- rounded to 1+23 bits mantissa using round-to-even

 0    10000000   10101010100111111011110
sign biased exp    fractional mantissa

And the representation 01000000010101010100111111011110 corresponds to the number $$ 3.332999706268310546875 $$ Note that this is not the closest representable number to $3.333$, which would be the next one, $$ 3.33329999446868896484375 $$ but the round-to-even rule led to rounding down the full result of the addition, which compounded the error inherent in the two inputs each being slightly smaller than $1.111$ and $2.222$.

Best Answer

Related Solutions

[Math] Convert hexadecimal numbers to floating-point format using single-precision IEEE 754 format

[Math] Find the additive inverse of binary number

Related Question