MATLAB: Sum of some elements in two separate vectors

findvector

I have two different vectors (1000 numbers),A and B for instance. Each number of A has a specific value in B (A(i,1)=B(i,1)).

Some of A values are repeated. I want to sum the repeated values of A for each value of B and then plot A Vs B. for example:

A=[ 1 ; 2 ; 1 ; 5 ; 10 ; 5 ]

B=[0.1 ; 0.5 ; 0.2 ; 0.3; 0.8 ; 0.9]

For A=1, B=0.1 & 0.2 >>>> so when A=1, B=0.3 (sum of the values) For A=2, B=0.5 >>>> so when A=2, B=0.5 (It has the single value) For A=5, B=0.3 & 0.9 >>>> so when A=5, B=1.2 I have 1000 numbers for each A and B. Can you please help?

Best Answer

 [uA a b] = unique(A);
 sB = arrayfun(@(x) (sum(B(b==x))), 1:numel(a));
 X = [uA sB'];

Related Solutions

MATLAB: Replace duplicate value by 0 in matrix or vector

 b = [1 2 1 3];
 
 [a,c] = unique(b,'first');
 out = zeros(size(b));
 out(c) = a;

MATLAB: How to make average of set of data

[Edit to fix typo in image.]

This answer references your previous question, here.

I was able to extract the data from the curve.fig plot by renaming it .mat,

data = load('curve.mat');
x = data.hgS_070000.children(1).children.properties.XData;
y = data.hgS_070000.children(1).children.properties.YData;

Plotting the data shows that, over the first 10% of the x-data, there can be up to four duplicate y-values. However, because of the discreteness of the data and the fact that x isn't uniformly sampled, some of the "duplicate" values appear to missing.

Just for grins, I plotted the x and y data vs. their index, as well as diff(x) and diff(y): it appears that y comes from an analytical Gaussian curve, centered at 500, while x exhibits some odd discretization artifacts.

figure;
subplot(3, 1, 1);
hp1 = plot(x, y);
title('original data');
xlabel('x');
ylabel('y');
xlim([0, 2325]);
ylim([0, 12]);
subplot(3, 1, 2);
hp2 = plotyy(1:numel(x), x, 1:numel(y), y);
title('x and y vs. index')
xlabel('ind')
ylabel(hp2(1), 'x data');
ylabel(hp2(2), 'y data');
xlim(hp2(1), [0, 1000]);
xlim(hp2(2), [0, 1000]);
subplot(3, 1, 3);
hp3 = plotyy(2:numel(x), diff(x), 2:numel(y), diff(y));
title('diff(x) and diff(y) vs. index')
xlabel('ind')
ylabel(hp3(1), 'diff(x)');
ylabel(hp3(2), 'diff(y)');
xlim(hp3(1), [0, 1000]);
xlim(hp3(2), [0, 1000]);

In order to apply a summation or averaging filter to the y data, it's useful to fill in the gaps in x so that it is uniformly spaced, as addressed in your previous question. The difficulty lies in that x is non-monotonic, as evidenced by the 0 and negative values in the diff(x) plot. However, if we can get x to be uniformly spaced integers (with potentially repeated values), then we can convert x and y to parametric equations based on their common index, and interpolate that way, i.e. convert the ideal (non-Matlab form) y(x) to x(k) & y(k), and interpolate based on k, as in the previous question:

diffx = diff(x);
val = sign(diffx); 
len = abs(diffx) + 1; % add one to include diff==0 as new phase; need to subtract off cumsum in ind
ind = [0, cumsum(len) - cumsum(abs(val))] + 1; %  add one for 1-based indexing; note: x == X(ind);
n = ind(end); % note: numel(X) == sum(abs(diff(x))) + 1 == n;
mask = false(1, n-1); 
mask(ind(1:end-1)) = true; % ind(end) == numel(X), not start of new phase
diffX = val(cumsum(mask)); % cumsum(mask) gives the rle phase number, i.e. index into val
X = x(1) + cumsum([0, diffX]);
K = 1:numel(X);
Y = interp1(K(ind), y, K);

We're now free to play with a few different smoothing techniques. As an example, I'll show the results of summing, averaging, and then smoothing the averaged data using a moving-average filter:

subs = (X - min(X)) + 1;
Ysum = accumarray(subs(:), Y).';
Ymean = accumarray(subs(:), Y, [], @mean).';
windowSize = 10;
Yfilt = filter(ones(1, windowSize), windowSize, Ymean);

Plotting the results,

figure;
subplot(3, 1, 1);
hp1 = plot(x, y, '-', X, Y, '--');
title('original and interpolated data');
xlabel('x');
ylabel('y');
legend({'orig x-y', 'interp X-Y'});
xlim([0, 2325]);
ylim([0, 12]);
subplot(3, 1, 2);
hp2 = plot(x, y);
hold all;
plot(X, Y, '--');
plot(min(X):max(X), Ysum, '--');
plot(min(X):max(X), Ymean, '--');
plot(min(X):max(X), Yfilt, '--');
title('various Y schemes')
xlabel('X');
ylabel('{y, Y, Ysum, Ymean, Yfilt}');
xlim([0, 2325]);
ylim([0, 12]);
legend({'orig', 'interp', 'sum', 'mean', 'moving avg. (filt)'});
subplot(3, 1, 3);
hp3 = plotyy(2:numel(X), diff(X), 2:numel(Y), diff(Y));
% hp3 = plotyy(2:numel(x), diff(x), 2:numel(y), diff(y));
title('diff(X) and diff(Y) vs. index')
xlabel('ind')
ylabel(hp3(1), 'diff(X)');
ylim(hp3(1), [-10, 10]);
ylabel(hp3(2), 'diff(Y) ');

In the first subplot, we see that the interpolated data (dashed line) exactly matches the original data. In the second, we see that none of the smoothing schemes are terribly pretty, though which is "best" will depend on your application. Note that all have a jump where the data becomes monotonic. (You could use a larger running average filter to further smooth the mean data; using filtfilt might be wise to prevent a phase lag. However, since y seems to be analytical, I wonder if there is an analytical method you should be considering...) In the third subplot, we see that the step size in x is now exactly +1, 0, or -1, as desired in the previous question; because of the interpolation, however, the step size in y is no longer the analytical derivative of the Gaussian, but some discretely filtered variation of it.

I hope this helps you better analyze your problem. In general, once you've applied the interpolation scheme to get X and Y, you should find it trivial to apply any desired smoothing algorithm, though again, thinking about an analytical solution might be wise.

Please accept this answer if it helps, or let me know in the comments if I've missed something. (Note: I might not get back to you for a couple of days.)

Best Answer

Related Solutions

MATLAB: Replace duplicate value by 0 in matrix or vector

MATLAB: How to make average of set of data

Related Question