[Math] Converting 0.1 to binary 64 bit double

I want to convert the decimal number 0.1 to binary 64 bit double. So I do it like that:

$$ 0.1_{10} = 0.00011001100110011001100110011001100110011001100110011001100110… \times 2^0 $$

Represent it in the scientific form:

$$ 1.1001100110011001100110011001100110011001100110011001100110… \times 2^{-4} $$

Now 64 bit IEEE754 float allows 52 bits for mantissa, so I need to round the number to 52 bits.

$$ 1.\underbrace{1001100110011001100110011001100110011001100110011001}_{52 bits}100110… \times 2^{-4} $$

So I have to round to either:

smaller number (truncated)

$$ 1.1001100110011001100110011001100110011001100110011001 $$

larger number (original number plus 1)

$$ 1.1001100110011001100110011001100110011001100110011010 $$

Since the 53 bit is 1, I'm rounding up to the larger number. So I have mantissa part ready. Then I'm calculating biased exponent (11 bits for the exponent):

$$ 2^{11-1} -1 = 1023\\
1023-4=1019\\
1019_{10} = 1111111011_2 $$

So the final representation should be:
$$ \underbrace{0}_{sign}\underbrace{01111111011}_{exponent}\underbrace{1001100110011001100110011001100110011001100110011010}_{mantissa} $$

Is this correct?

[Math] Converting 0.1 to binary 64 bit double

Best Answer

Related Question

Best Answer

Related Solutions

[Math] Show that floating point $\sqrt{x \cdot x} \geq x$ for all long $x$.

[Math] Data Representation Question

Related Question