MATLAB: Computing mean by group (need help with speed!)


Hi everyone,
I need to demean the columns of matrix Xw (200000 x 24) by group id. To do this, I need to compute means of the columns of Xw for each group identified by the vector id (200000 x 1). I have written the following code, which is the fastest I could do in light of this very similar post:
for i=1:size(Xw,2);
bb = accumarray(indx_j, Xw(:,i), [], @mean);
Xw(:,i) = Xw(:,i) - bb(indx_j);
This is faster than the code suggested at the post linked above. Importantly, the matrix Xw is rather sparse, and only contains zeros and ones (it is a matrix of dummy variables).
My question: Is there any way to speed up this process further? It is quite time consuming as it stands, and given that this itself is inside of a loop, it is slowing everything else down. Please help!! Creative solutions welcome.

Best Answer

In the line of Andrei/Matt's answers and based on my comment (under Andrei's answer):
[ii,jj] = ndgrid( id, 1:size( Xw, 2 )) ;
iijj = [ii(:), jj(:)] ;
sums = accumarray( iijj, Xw(:) ) ;
cnts = accumarray( iijj, ones( numel( Xw ), 1 )) ;
means = sums ./ cnts ;
Xw = Xw - means(id, :) ;
Kelly's solution is still the most efficient on my (old) laptop for small numbers of groups (here 20):
Time OP = 0.562232s
Time KK = 0.239906s
Time AB = 0.664438s
Time ME = 0.602663s
Time CW = 0.258647s
My variant of Andrei/Matt's solutions is slightly better with larger numbers of groups (here 1000):
Time OP = 1.318130s
Time KK = 0.342277s
Time AB = 1.426827s
Time ME = 1.355189s
Time CW = 0.279261s
Actually, here is the profile as a function of the number of groups, still on my rather old laptop:
We see that my solution is pretty flat, and crosses Kelly's in the range 100 < n groups < 300.