MATLAB: Possible Bug? or checking if OverFlow happened

bugMATLABnot a bugoverflowunderflow

I have a matrix of size (6472908 x 67) all single values. Different columns have different max/min (there are different variables.
So I calculate mean of each column using
avgData=mean(Data);
I am expecting the first value in avgData to be the mean of the first column. However, when I issue
avgData(1) - mean(Data(:,1))
ans =
-100.9785
as you can see the output is not zero. So what is changed? The same thing happened if I convert everything to double.
If I do this for sum() the difference is even more. So, I was wondering if overflow is happening and how should I check if overflow has happened? lastwarn() returns nothing.
I am afraid the X is about 800MB and can't upload it here.

Best Answer

No. It is not a bug, but an artifact of operations that may be done in a different order due to the BLAS, or whatever scheme is used internally. NEVER assume that two distinct operations will do a given computation in the same order. It might conceivably reflect an issue of whether a double precision accumulator might be employed for vector input to sum (again, a choice probably made in the BLAS), but not for array input.
If the min and max values vary by such a large amount, that difference is trivial, essentially down in the least significant bits of the result, especially when you are summing millions of such elements.
You have not yet said what the total mean was either, so we cannot know how significant is the difference.
As for this being an overflow, that is not at all reasonable to assume. The numbers you have described are simply not large enough to cause overflow, AND if they did overflow, overflows in floating point result in inf, NOT a loss of precision.
realmax('single')
ans =
3.4028e+38
realmax('single')*2
ans =
Inf
The problem here is clearly an issue of bits lost at the low end, due to variation in the sequence of adds in these numbers.
You can test that claim by computing the mean in different sequences of your vector. For example, try this test several times:
mean(data(randperm(6472908),1))
then look at the differences.
As well, compare those differences to the size of the actual mean. How does that difference compare to eps for that same number?