[Math] How to find the number of normalised floating point numbers in a system

computer sciencediscrete mathematicsfloating pointnumerical methods

I'm trying to make floating point number systems a bit more intuitive for myself. There are a few things I am confused about, and I think the best way to clear up my confusions would be for someone to guide me through one question:

How many numbers are there in a floating point number system given the base ($B$), precision ($P$), maximum exponent ($e_{max}$) and minimum exponents ($e_{min}$)?

Wikipedia provides the following formula to obtain the number of normalized floating-point number in a system:

$$2*(B − 1)*(B^{P − 1})*(e_{max} − e_{min} + 1) + 1$$

If someone could explain each term's significance that would be extremely helpful. The $+1$'s are the most confusing…

Best Answer

First we consider the significand. The significand is a string of $P$ digits in the base $B$, but except for the representation of the number $0$ it is normalized so that the first digit is not $0$. Since there are $B$ base-$B$ digits, $0,1,\dots,B-1$, there are $B-1$ possible first digits, $1,\dots,B-1$. Each of the remaining $P-1$ slots in the significand can be filled with any of the $B$ digits, so these $P-1$ slots can be filled in $B^{P-1}$ different ways. There are therefore altogether $(B-1)B^{P-1}$ ways to fill all $P$ slots of the significand.

Example: If $B=10$ and $P=3$, you’re just counting the possible $3$-digit numbers in base $10$, i.e., the integers from $100$ through $999$: there are $9$ choices for the first digit, since $0$ is not permitted, and $10$ for each of the other two digits, for a total of $9\cdot 10^2=900$ choices. As a check on this, observe that there are certainly $999$ integers between $1$ and $999$ inclusive, and we’re throwing out the $99$ integers from $1$ through $99$, so we must be left with $999-99=900$ integers altogether.

The significand is signed, so each of these $(B-1)B^{P-1}$ different values can appear either positively or negatively; this doubles the actual number of choices for the significand, making it $2(B-1)B^{P-1}$.

Now we consider the exponent. It can take any integer value between $e_{\min}$ and $e_{\max}$ inclusive. Let $a=e_{\min}-1$; then $e_{\min}=a+1$, $e_{\min}+1=a+2$, $e_{\min}+2=a+3$, and in general $e_{\min}+k=a+(k+1)$ for any $k$. Let $d=e_{\max}-e_{\min}$; then $e_{\max}=e_{\min}+d$, so $e_{\max}=a+(d+1)$. The integers from $e_{\min}$ through $e_{\max}$ are therefore $$a+1,a+2,\dots,a+d,a+(d+1)\;,\tag{1}$$ and it’s clear from looking at the second term of each of these sums that there are $d+1$ integers in the list $(1)$. But $d+1=e_{\max}-e_{\min}+1$, so there are $e_{\max}-e_{\min}+1$ possible values of the exponent.

Example: If $e_{\max}=8$ and $e_{\min}=-7$, $a=-7-1=-8$, and $d=8-(-7)=15$, the possible exponents are the integers from $a+1=-7$ through $a+(d+1)=-8+16=8$, and there are $d+1=16$ of them. As a check, there are clearly $7$ possible negative exponents, $8$ possible positive exponents, and $0$, for a total of $7+8+1=16$ possible exponents.

Each of the $2(B-1)B^{P-1}$ possible values of the signed significand can be combined with any of the $e_{\max}-e_{\min}+1$ possible values of the exponent, so there are altogether $$2(B-1)B^{P-1}(e_{\max}-e_{\min}+1)\tag{2}$$ possible combinations, each representing a different number. However, $(2)$ counts only the non-zero numbers that can be represented; the system can also represent the number $0$, so the grand total is $$2(B-1)B^{P-1}(e_{\max}-e_{\min}+1)+1$$ representable numbers.

Related Question