MATLAB: Cumsum gives different results for double and single

cumsumcumsum;double;single;weirddoublesingle

Hey,

Does anyone have an explanation for this?

a = single(rand(1000,1000));
b = cumsum(cumsum(a));
c = single(cumsum(cumsum(double(a))));
sum(b(:)~=c(:))
ans =
      864947

Thanks!

Yuval

Best Answer

single() and double() are handled by hardware instructions using IEEE 754 standards on floating point arithmetic.

Suppose you add

1/100000 + 1/99999 + 1/99998 + .... 1/3 + 1/2 + 1

and you use a particular number of bits in the mantissa. As you add terms, the cumulative sum becomes large relative to the values of the terms being added, and some terms may effectively give no contribution -- as the "focus" of the floating point number shifts "left" as the sum gets larger, some terms may be less than 1/2 the contribution of the least significant bit of the sum, and so will not contribute anything to the sum. They might as well not be there.

Now run the same sum again, with the same original approximations, but with more bits in the mantissa. 53 bits to accumulate into takes a lot longer to overflow than 24 bits, so those bits that would formerly be lost from contributing start to contribute -- and they add up! For example the terms between 1/10000 and 1/1001 add up to about 2.3 . The terms between 2^(-53) and 2^(-24) add up to about 20.1, and that is a contribution that can be meaningfully made at 53 bits that would be lost at 24 bits

You are mentally picturing an addition scheme in which no bits are lost as you add those single precision numbers, sort of like adding integers and using as many bits as you need. But that is not how adding floating point numbers works: any part of the next value that does not fit within the accumulating value is just thrown away, but the contributions of those parts you are throwing away can turn out to be important and would be more retained with a wider register.

Related Solutions

MATLAB: Matlab floating point precision.

Never trust the least significant bits of a floating point number, at least unless you know enough about the extent that you can trust them. But in general, don't.

A double precision floating point number carries roughly 16 digits, actually 52 binary bits of precision.

What happens when you add two numbers that are of different magnitude?

X0 = 1/3
X0 =
         0.333333333333333

So X0 is 1/3. Not exactly because we cannot represent the fraction 1/3 exactly in binary. But that is not really pertinent here.

sprintf('%0.55f',X0)
ans =
    '0.3333333333333333148296162562473909929394721984863281250'

Suppose we add and subtract delta?

delta = 100000.1
delta =
                  100000.1
X1 = X0 + delta - delta
X1 =
       0.333333333328483

What happened? We added a number that was more than 1e5 times as large as X0. Stored in a double, that means we lose the bottom 5 digits of the sum. Subtracting delta off again, does not help, because we already lost that information. Once information is lost, it cannot be magically regained.

MATLAB: Summing up a large single precision array

When the cumulative sum of your array has reached, say, the halfway point around 2.5e9, in each further addition the size of the least significant bit of the 24-bit significand (mantissa) that single precision possesses, must necessarily have a value of 2^32/2^24 = 2^8. That is because this is a floating point format. This means the round-off errors are ranging from a whopping big -128 to +128 with each further addition, which is a large fraction of the average value of each addend, and you still have another twelve million additions to make. Work it out for yourself. Twelve million additions yet to perform with up to plus or minus 128 error at each step. The difference between your two results is certainly due to round-off error in the single precision format, and you are lucky to have only the amount of error you quote.

It is wildly inappropriate to attempt to perform twenty-five million additions with only single precision floating point numbers.

Best Answer

Related Solutions

MATLAB: Matlab floating point precision.

MATLAB: Summing up a large single precision array

Related Question