MATLAB: Group mean for repeated samples

accumarrayarrayfun

Have a looping sample with index 1-5 through time, I need averages for each period and index (not simple group-wise mean of id 1-5). Complicated by occasional non-even sample numbers, i.e. usually there are 2 data points for each id, but occasionally 3 and also by the hh and mm time stamp not changing in step with id, i.e. the same minute could exist for several id's. Data like so (actual dataset larger and more complicated):
yyyy mm dd hh mm rec id data1 data2
2015 5 16 12 54 65433 1 402.2262 3.598117
2015 5 16 12 54 65434 1 401.9153 3.596588
2015 5 16 12 54 65435 2 401.9349 3.597093
2015 5 16 12 54 65436 2 402.1344 3.598488
2015 5 16 12 54 65437 3 401.6603 3.599219
2015 5 16 12 55 65438 3 402.3899 3.596839
2015 5 16 13 3 65439 4 401.6629 3.574918
2015 5 16 13 3 65440 4 401.9409 3.575383
2015 5 16 13 3 65441 5 402.3803 3.574625
2015 5 16 13 3 65442 5 402.174 3.571742
2015 5 16 13 24 65443 1 401.9677 3.568859
2015 5 16 3 24 65444 1 401.7614 3.565976
Could use a moving window-type approach:
for i = 1:floor(n/(period)
means1(i) = mean(T1((i*period-(period-1)):(i*period)));
means2(i) = mean(T2((i*period-(period-1)):(i*period)));
...but occasionally there are less than expected numbers in the period.
This is nice solution if I wanted all ID 1 etc means:
[ii,jj] = ndgrid(id,1:size(data,2));
bb = accumarray([ii(:),jj(:)], data(:), [], @mean);
meanout = data - bb(id,:);
So what I want as result is:
id n mean1 mean2
1 2 402.07075 3.5973525
2 2 402.03465 3.5977905
3 2 402.0251 3.598029
4 2 401.8019 3.5751505
5 2 402.27715 3.5731835
1 2 401.86455 3.5674175
Any advice appreciated, could not find info for this specific issue. Some clever use of accumarray will probably work, but I could not conquer it.

Best Answer

It looks like you can't use the date to generate (with unique as the cyclist suggested) a key for accumarray. But since your data is sorted, really, the only thing you want is to find where each period starts. That's easily done, that's when the diff of id is negative. So:
period = cumsum([0; diff(id)<0]); %asuming id is a column vector
[pid, ~, subs] = unique([period id], 'rows');
mean1 = accumarray(subs, data1, [], @mean);
mean2 = accumarray(subs, data2, [], @mean);
count = accumarray(subs, ones(size(subs)));
result = array2table([pid, count, mean1, mean2], 'VariableNames', {'period', 'id', 'count', 'mean1', 'mean2'})