MATLAB: How to separate a data set into multiple tables sorted by a specific variable

datasort

I have datathat is 63351 x 7 cell. The data is different prices of commodities/assets at specific times. At first, all data for AUD/USD is listed at the specific times and dates. Then, the asset switches to EUR/USD and restarts with the same date and times as before. The name of the asset is listed in one of the columns. I need to separate this data into multiple tables based on the specific asset. Any idea how to do this?

Best Answer

If you are using R2016b or later, you probably want to be using timetables instead of cell arrays. Especially if your data are at all large. It's not entirely clear to me what your data look like, or what you'll do with them after you split by asset, but I will also argue that with timetables, you may not need to split the data at all. In any case, here's an example of one simple way to do it.
>> price = rand(20,1);
>> volume = rand(20,1);
>> asset = categorical([1;2;1;2;1;2;1;2;1;2;1;2;1;2;1;2;1;2;1;2],1:2,{'a' 'b'});
>> currency = categorical([1;1;1;1;1;1;1;1;1;1;2;2;2;2;2;2;2;2;2;2],1:2,{'AUD/USD' 'EUR/USD'});
>> date = datetime(2018,1,[1;1;2;2;3;3;4;4;5;5;1;1;2;2;3;3;4;4;5;5]);
>> tt = timetable(date,currency,asset,price,volume)
tt =
20×4 timetable
date currency asset price volume
___________ ________ _____ ________ _______
01-Jan-2018 AUD/USD a 0.69103 0.56856
01-Jan-2018 AUD/USD b 0.091485 0.84191
02-Jan-2018 AUD/USD a 0.30828 0.0533
02-Jan-2018 AUD/USD b 0.62075 0.55293
03-Jan-2018 AUD/USD a 0.59696 0.70035
03-Jan-2018 AUD/USD b 0.19802 0.39625
04-Jan-2018 AUD/USD a 0.39491 0.9954
04-Jan-2018 AUD/USD b 0.75725 0.87266
05-Jan-2018 AUD/USD a 0.11658 0.99732
05-Jan-2018 AUD/USD b 0.46778 0.13998
01-Jan-2018 EUR/USD a 0.45711 0.92185
01-Jan-2018 EUR/USD b 0.30848 0.35911
02-Jan-2018 EUR/USD a 0.21144 0.91209
02-Jan-2018 EUR/USD b 0.8504 0.29512
03-Jan-2018 EUR/USD a 0.71659 0.2831
03-Jan-2018 EUR/USD b 0.57582 0.24675
04-Jan-2018 EUR/USD a 0.54203 0.172
04-Jan-2018 EUR/USD b 0.74173 0.38838
05-Jan-2018 EUR/USD a 0.5002 0.51731
05-Jan-2018 EUR/USD b 0.087709 0.50291
>> tt = sortrows(tt,'asset');
>> [~,start] = unique(tt.asset,'first');
>> stop = [start(2:end)-1; height(tt)];
>> nassets = length(categories(tt.asset));
>> c = cell(1,nassets);
>> for i = 1:nassets, c{i} = sortrows(tt(start(i):stop(i),:)); end
>> c{:}
ans =
10×4 timetable
date currency asset price volume
___________ ________ _____ _______ _______
01-Jan-2018 AUD/USD a 0.69103 0.56856
01-Jan-2018 EUR/USD a 0.45711 0.92185
02-Jan-2018 AUD/USD a 0.30828 0.0533
02-Jan-2018 EUR/USD a 0.21144 0.91209
03-Jan-2018 AUD/USD a 0.59696 0.70035
03-Jan-2018 EUR/USD a 0.71659 0.2831
04-Jan-2018 AUD/USD a 0.39491 0.9954
04-Jan-2018 EUR/USD a 0.54203 0.172
05-Jan-2018 AUD/USD a 0.11658 0.99732
05-Jan-2018 EUR/USD a 0.5002 0.51731
ans =
10×4 timetable
date currency asset price volume
___________ ________ _____ ________ _______
01-Jan-2018 AUD/USD b 0.091485 0.84191
01-Jan-2018 EUR/USD b 0.30848 0.35911
02-Jan-2018 AUD/USD b 0.62075 0.55293
02-Jan-2018 EUR/USD b 0.8504 0.29512
03-Jan-2018 AUD/USD b 0.19802 0.39625
03-Jan-2018 EUR/USD b 0.57582 0.24675
04-Jan-2018 AUD/USD b 0.75725 0.87266
04-Jan-2018 EUR/USD b 0.74173 0.38838
05-Jan-2018 AUD/USD b 0.46778 0.13998
05-Jan-2018 EUR/USD b 0.087709 0.50291
Prior to R2016b, you can do the same thing with tables, but timetables have useful time-based functionality built into them for things like synchronizing multiple data sets and interpolating data.