[Math] The upper and lower limits of IEEE-754 standard

floating point

So there's something I just can't understand about ieee-754.

The specific questions are:

Which range of numbers can be represented by IEEE-754 standard using base 2 in single (double) precision?

Which range of numbers can be represented by IEEE-754 standard using base 10 in single (double) precision?

Which range of numbers can be represented by IEEE-754 standard using base 16 in single (double) precision?

(the textbook is not in English so I might not have translated this well but I hope you get the point).

The only information given in the textbook are the ranges themselves without the actual explanation of how they were calculated. For example:

binary32:

The largest normalized number: $(1-2^{-24})\times 2^{128}$

The smallest normalized number: $1.0\times 2^{-126}$

The smallest subnormal number: $1.0\times 2^{-149}$

I have a test coming up where these kind of question will appear and I really don't feel like learning all of this by heart. On the other hand, there must be a method to calculate these values, but they seem so random and that's what confuses me.

Best Answer

The exponent for the IEEE-754 standard for single precision is in the range $-126$ ... $127$. The mantissa is of the form $1.xxxxxxxxxxxxxxxxxxxxxxx_2$ (23 binary digits ($x$'s), every $x$ is $0$ or $1$) for normalised numbers, and of the form $0.xxxxxxxxxxxxxxxxxxxxxxx_2$ for the subnormal numbers (which always assumes the exponent to be $-126$). Thus:

  • The biggest number takes the biggest mantissa and the biggest exponent: $1.11111111111111111111111_2\times 2^{127}=(2-2^{-23})\times 2^{127}=(1-2^{-24})\times 2^{128}$
  • The smallest normalised number takes the smallest normalised mantissa and the smallest exponent: $1.00000000000000000000000_2\times 2^{-126}=1.0\times 2^{-126}$
  • The smallest subnormal number takes the smallest subnormal mantissa and the (smallest) exponent $-126$: $0.00000000000000000000001_2\times 2^{-126}=2^{-23}\times2^{-126}=1.0\times 2^{-149}$

I've used the index $_2$ to denote a number written in binary (base $2$); all the other numbers are written in base $10$.

Related Question