How to Work with IEEE 754 32-bit Floating Point Format

arithmeticbinaryfloating point

I'm having trouble completing a question that deals with the IEEE 754 32-bit floating point format, primarily because I don't know how to use it. I was hoping someone here could clarify for me using the following example (or link to relevant sources that may reveal how I can complete the problem).

Convert 4.625 to floating point representation.
Convert 1100 0001 0001 1100 0000 0000 0000 0000 to decimal.
Add the results of 1 and 2 together, and represent the result as a float.

Thank you!

Best Answer

IEEE 754 single precision is a standard used to represent floating-point numbers in base 2 on 32 bits. Every representable floating-point number has a representation of the form: $$ \underbrace{\fbox{$c_1$}}_{\pm} \ \underbrace{\fbox{$c_2 c_3 c_4 c_5 c_6 c_7 c_8 c_9$}}_{E} \ \underbrace{\fbox{$c_{10} c_{11} c_{12} \cdots c_{31} c_{32}$}}_{m-1} $$ where each $c_i$ is either 0 or 1. The first bit is the sign bit, the next 8 bits are the exponent part and the rest is the mantissa (in fact, it's the mantissa -1, as I'll explain). The number above must be interpreted as $$ \underbrace{(-1)^{c_1}}_{\text{sign}} \quad \underbrace{\vphantom{(}2^{(c_2 c_3 \cdots c_9)_2}}_{\text{exponent}} \quad \underbrace{\vphantom{(}2^{-127}}_{\text{excess}} \quad \underbrace{(1,c_{10} c_{11} c_{12} \cdots c_{31} c_{32})_2}_{\text{mantissa}}. $$ Here are a couple of key points about this representation:

the exponent part doesn't quite give you the exponent; it represents the desired exponent + 127. This is so exponents can be represented in increasing order from 0 to 255 and without a need for a sign bit. Once you subtract 127, the actual exponent ranges from -126 to 127.
typically, you try to normalize your representation. This means that you'll always express your numbers with a mantissa of the form $1,c_{10}c_{11}\ldots c_{32}$. This is similar to when you write number in "scientific format"; you normalize them so they look like $1,234 \cdot 10^5$, not $0,001234 \cdot 10^7$. In base 2, this means the mantissa always starts with a 1. So you gain one bit by not storing this 1 and just remembering that it should be added in. (Technically, not all numbers are normalized and there is such a thing as denormalized numbers but that's not important right now.)
Not all exponents are allowed to represent numbers. For instance, all zeros and all ones in the exponent part are used to represent special numbers (like NaN or $\pm \infty$).

So for instance, to represent the number 1, you set the sign bit to 0 (which means +), you need an exponent of zero, so you set the exponent part to $127 = (01111111)_2$ and the mantissa should be 1, so you fill the $m$ part with all zeros.

In your questions, you're asked to play a bit with this representation. To convert 4.625, convert the integer and decimal parts separately, and then add them. Once you have the base-2 representation, filling in the bits of the IEEE representation is not difficult.

By Googling a bit, you'll find online decimal/IEEE converters so you can check your answers.

Best Answer

Related Solutions

[Math] Convert hexadecimal numbers to floating-point format using single-precision IEEE 754 format

[Math] IEEE-754 Format Conversion

Related Question