[Math] What are the biggest and smallest represent-able numbers with single precision floating points

numerical methods

I am trying to understand the limits of the floating point representation.

On a 32-bit computer with 7 bits for the exponent and 24 bits for the mantissa, I want to know the biggest and smallest numbers.

My calculation:

Base 2

Biggest positive number =
$$
+ 1 \times 2^{127}
$$

Smallest positive number =
$$
+ 2^{-24} \times 2^{-127}
$$

Biggest negative number = $$
– 2^{-24} \times 2^{-127}
$$

Smallest negative number = $$-1 \times 2^{127}$$

Decimal

Biggest positive number =
$$
+1 \times 10^{38}
$$

Smallest positive number=
$$
+ 10^{-7} \times 10^{-38}
$$

Biggest negative number=
$$
– 10^{-7} \times 10^{-38}
$$

Smallest negative number =
$$
-1 \times 10^{38}
$$

Is this a correct calculation?

Best Answer

Your implementation varies a bit from the IEEE standard, but the Wikipedia page gives explicit calculations for binary. There are details about the offset in the exponent and whether the leading 1 is suppressed in the mantissa that need to be considered before you get a final answer. But their range is about 1.18 E-38 to 3.4 E38 and with one bit less in the exponent you should have a range of about 1.18 E-19 to 3.4 E19.

For the base 10 case you need to specify how the base 10 numbers are stored for a clean answer. Added: it looks like your base 10 numbers are approximately the decimal equivalents of the binary. I hadn't noticed and thought you were storing values somehow in base 10.

Best Answer

Related Solutions

[Math] Floating point representation in 8 bit

Related Question