Due to the finite precision of the computer, numbers used in calculations must conform to the format imposed by the machine. So only real numbers with a finite number of digits can be represented. A normalized floating point system $\mathbb{F}=F(\beta,p,e_{\text{min}},e_{\text{max}})$ consists of a set of real numbers written in normalized floating point form $x=\pm m \times \beta^{e}$, where $m$ is the mantissa of $x$ and $e$ is the exponent.
If $x \neq 0$ then the mantissa $m$ can be written as:
\begin{equation}
m = a_N +a_{N-1} \beta^{-1}+...+a_{-p} \beta^{-p-N}
\end{equation}
with $a_N \neq 0$ and $e_{\text{min}} \leq e \leq e_{\text{max}}$. If $x=0$ then the mantissa $m=0$ while the exponent $e$ can take any value.
In the above expressions, $p$ is the precision of the system, $\beta$ the base, and $[e_{\text{min}},e_{\text{max}}]$ the exponent range, with $e_{\text{min}}<0$, and $e_{\text{max}}=|e_{\text{min}}|+1$.
According to the definition the mantissa $m$ belongs to the range $[1,\beta)$. The machine epsilon is $\beta^{1-p}$ and represents the difference between the
mantissae of two successive positive numbers.
Now a number $x$ belong to the range $[x_{\text{min}}, x_{\text{max}}]$ where:
\begin{equation}
x_{\text{min}} = \beta^{e_{\text{min}}}
\end{equation}
and
\begin{equation}
x_{\text{max}} = (\beta-1)(1+\beta^{-1}+\beta^{-2}+... + \beta^{-(p-1)}) \beta^{e_{\text{max}}}< \beta^{e_{\text{max}}+1}
\end{equation}
We now prove the statement above. The general representation of $x \in \mathbb{R}$ in base $\beta$ is:
\begin{equation}
x=\pm (a_N \beta^N+a_{N-1} \beta^{N-1}+...+a_1 \beta+a_0+a_{-1} \beta^{-1}+...+a_{-p} \beta^{-p})= \pm m \times \beta^{e}
\end{equation}
When we collect the terms $\beta^N$ we have:
\begin{equation}
x=\pm (a_N +a_{N-1} \beta^{-1}+...+a_1 \beta^{-N+1}+a_0 \beta^{-N}+a_{-1} \beta^{-1-N}+...+a_{-p} \beta^{-p-N}) \times \beta^N= \pm m \times \beta^{e}
\end{equation}
We can identify $N$ with $e$ ($N=e$). Then:
\begin{equation}
m=\sum_{i=-p}^N a_i \beta^{i-N}
\end{equation}
The minimum value of $m$ is reached when $a_0=1$ and $a_i=0$ with $1 \leq i \leq p-1$. In this case $m=1$ and $x_{\text{min}} = \beta^{e_{\text{min}}}$.
The maximum value of $m$ is obtained when $a_i=\beta-1$ for all $0 \leq i \leq p-1$.
The machine epsilon is defined as $\epsilon_M=\beta^{1-p}$. It is a measure of the precision of the system, since it is a maximum bound on the relative distance between two consecutive numbers. It also represents the difference between the mantissae of two successive positive numbers. In normalized floating point systems, no number that does not fit the finite format imposed by the computer can be represented.
The total number of elements in $\mathbb{F}$ is given by the following expression:
\begin{equation}
2 (\beta-1) \beta^{p-1} (e_{\text{max}}-e_{\text{min}}+1)+2
\end{equation}
Computers can work with single- or double-precision. IEEE standard single-precision floating point numbers belong to the normalized floating point system $F(2, 24, −126, +127)$, while IEEE standard double-precision floating point numbers belong to the normalized floating point system $F(2, 53, −1022, +1023)$.
Best Answer
The
[16:31]
and[15:0]
refer to locations in the binary representation of a $32$-bit integer. You have interpreted this correctly.When in doubt about technical problems, always consult
Wikipediaan expert.In your case the number is
0
||00001101
||101 1001 1111 1110 1101 0011
The sign is positive.
The biased exponent is
1101
$ = 13$, so the actual exponent is $13 - 127 = -114$, assuming single precision.So the answer you have is correct: $$2^{-114} \times (1.101 1001 1111 1110 1101 0011)_2$$