[Math] Convert the following decimal number into 32-bit IEEE floating-point form.

computer sciencediscrete mathematicselementary-number-theorynumber-systems

I am given a negative decimal -1234.875.

I understand the normal process of solving a question like this, except I am uncertain about handling the negative.

What I do is find the binary form of 1234 + 0.875

1234/2 = 617 R: 0
617/2 = 308 R: 1
etc.
give me
1234(base 10) = 1001 1010 010(base 2)

0.875 * 2 = 1.75
0.75 * 2 = 1.5
0.5 * 2 = 1.0
gives me
0.875(base 10) = 111(base 2)

Thus, 1234.875(base 10) = 10011010010.111(base 2)

I then normalize it, find sign bit, biased exp, significand, but where do I account for that negative?

Best Answer

Accounting for negatives is done via the sign bit (picture from Wiki):

enter image description here

Since $-1234.875_{10} = -10011010010.111_{2} = (-1)^{1}\cdot 1.0011010010111 \cdot 2^{10}$, then we normalize via $2^{10} = 2^{137 - 127}$. Since $137_{10} = 10001001_{2}$, then we have the bit pattern

\begin{align} 1\, \big|\, 10001001 \, \big| \, 00110100101110000000000 \end{align}

There are easier ways of verifying that this is correct, but the following code will do it:

#include <iostream>
#include <iomanip>

union bits
{
  float x;
  uint32_t i;
};

int main()
{
  bits z;
  z.x = -1234.875;

  std::cout << std::hex << z.i << std::endl;
}

Of course you'll have to convert the hexadecimal back to binary . . .

Related Question