Hi all – I have a dataset with the first vector a set of strings (call this data.a) and the second vector a set of numbers (call this data.b).
I'd like to generate a second dataset where the first vector is the unique strings (ie data2.c = (unique(data.a),'rows')) and where the second vector is the average value of all data.b that corresponds to the unique values of data.a.
This will be run on a large dataset where the number of unique values in data.a will be quite large and each unique value for data.a will have an unknown number of corresponding values in data.b, but the total number of entries in each of data.a and data.b are equal.
So if I had:
data = a b'AA' 1'AA' 2'BB' 3'BB' 4'CC' 5'CC' 6
I'd like to return
data 2 =
c d'AA' 1.5'BB' 3.5'CC' 5.5
Any thoughts on how to do this?
Best Answer