I am trying to understand the limits of the floating point representation.
On a 32-bit computer with 7 bits for the exponent and 24 bits for the mantissa, I want to know the biggest and smallest numbers.
My calculation:
Base 2
Biggest positive number =
$$
+ 1 \times 2^{127}
$$
Smallest positive number =
$$
+ 2^{-24} \times 2^{-127}
$$
Biggest negative number = $$
– 2^{-24} \times 2^{-127}
$$
Smallest negative number = $$-1 \times 2^{127}$$
Decimal
Biggest positive number =
$$
+1 \times 10^{38}
$$
Smallest positive number=
$$
+ 10^{-7} \times 10^{-38}
$$
Biggest negative number=
$$
– 10^{-7} \times 10^{-38}
$$
Smallest negative number =
$$
-1 \times 10^{38}
$$
Is this a correct calculation?
Best Answer
Your implementation varies a bit from the IEEE standard, but the Wikipedia page gives explicit calculations for binary. There are details about the offset in the exponent and whether the leading 1 is suppressed in the mantissa that need to be considered before you get a final answer. But their range is about 1.18 E-38 to 3.4 E38 and with one bit less in the exponent you should have a range of about 1.18 E-19 to 3.4 E19.
For the base 10 case you need to specify how the base 10 numbers are stored for a clean answer. Added: it looks like your base 10 numbers are approximately the decimal equivalents of the binary. I hadn't noticed and thought you were storing values somehow in base 10.