How many bits to represent these numbers precisely

computational mathematicscomputer sciencefloating point

Consider the following numbers:

$$19=10011_b, 12.75=1100.11_b, 7.125=111.001_b$$

What is the minimum number of bits necessary to represent the above three numbers precisely?

A system like the IEEE floating point standard should look like

$$\pm1.b_1\dots b_5 \times 2^p$$

where $2\le p\le 4$ so that there is one sign bit, one "free" bit (the leading $1$), and five bits for $b_1\dots b_5$. Then we need eight values for $p$, so that gives additional three bits. So does this mean that to represent the above numbers one would need at least a 9-bit floating point system? I think it's somewhat dubious, seems like too many bits would be needed for these simple numbers. Where's my mistake?

Best Answer

The answer depends on what you want to represent and the data format.

  • To represent them precisely requires zero bits, under the data format that "this format represents 19, 12.75, and 7.125".

  • If you need a that contains a value that is one of these three numbers, then two bits is enough, using the data format "00 means 19, 01 means 12.75, and 10 means 7.125".

  • If you need an array of five such values, you can make do with 8 bits, by reinterpreting the entries as a base-3 numeral for a number $0 \leq n < 243$, which can be stored in binary using 8 bits.

  • If you need a IEEE-like floating point format capable of exactly representing these three values under the usual interpretation of such formats, then you actually need 10 bits: you forgot the sign bit for the exponent.

  • You could modify the IEEE-format in various ways as well. For example, you might choose a format where "the sign is always positive and the exponents are always in the range $1 \leq e \leq 4$". Then seven bits would suffice.