[Math] Encoding a floating point value

binarybinary operationsfloating point

Sincere salutations everyone. I would like to to encode 7/8 into Binary floating point. I know that 7/8 is .111 in Binary. However, how would I go about finding the exponent and the Mantissa for the value? I know the sign bit would be 0 (as it is positive). How many spaces would I move the decimal point? Thanks!

Best Answer

Assuming a standard IEEE 32 bits we have:

  • 8 bits of exponents
  • 23 bits of mantissa
  • 1 bit of sign

Your expansion $0.111$ is correct however we must normalize it such that there is a $1$ in front of the decimal point. In this case we will multiply it by $2$ (which as in base $10$ shifts the number toward the left).

Therefore $0.111 = 0.111 \cdot 2^1 \cdot 2^{-1} = (0.111 \cdot 2^1) \cdot 2^{-1} = 1.11 \cdot 2^{-1}$.

Now you can extract the sign-bit which is $0$, the exponent which is $-1$ (remember to apply the bias of $+127$) and the mantissa which is obtained by discarding the $1$ in front of the decimal point and keeping only the decimal. In this case the mantissa is $11000000000000000000000$.