MATLAB: Splitting a table into smaller ones based on two columns

datafindindexingsorttable

Hello,
The code I use produces a master table with 8 columns and around 800 rows. I would like to sort the data in the following steps:
  1. Take first line of data from master table and look at the values in column six and eight.
  2. Find all other rows in the table with a column six value within 0.5 of the line one column six value and a column eight value within 2 of the line one column eight value.
  3. Create new table with this data.
  4. Delete sorted data from master table.
  5. Repeat 1-> 4 until no data left.
  6. Be left with multiple tables.
Is there a way to do this?
Thank you for your help.

Best Answer

That can easily be done in a loop but it doesn't sound like a good idea. Breaking apart well-organized tabular data into sub-tables is like moving into a new house by unpacking each box in the driveway and carrying in each item from the box individually rather than just carying in the box. Keep the data together whenever possible.
Instead, each row of the table can be assigned a subgroup number and then you can use those row numbers to pull out data as needed.
Here's a functional demo with comments to illustrate this method. 'rowGroup' is used to identify subtable rows.
% Create demo data
T = array2table(rand(20,8).*2);
T{:,8} = T{:,8} * 5;
% Identify the group number of each row based on
% col 6 & 8 values and their given tolerance levels.
rowGroup = zeros(size(T,1),1); % This will store the group number for each row
while any(rowGroup==0)
% find next unassigned row, starting at the top
rowNum = find(rowGroup==0,1,'first');
% find all rows in col 6 that are within tolerance
group1idx = abs(T{rowNum,6} - T{:,6}) <= 0.5;
% find all rows in col 8 that are within tolerance
group2idx = abs(T{rowNum,8} - T{:,8}) <= 2.0;
% identify the rows that fit into this group
rowGroup(group1idx & group2idx & rowGroup==0) = max(rowGroup)+1;
end
% rowGroup is a column vector of row numbers that identify the subgroups.
% Your values will differ due to using random data
% >> rowGroup(1:5)
% ans =
% 1
% 2
% 3
% 4

% 4
% now you can access sub-groups of data like this
T(rowGroup==1,:) % for group 1
To see the number of subgroups and the number of rows within each subgroup,
subgroupSummary = table((min(rowGroup):max(rowGroup))', ...
histcounts(rowGroup,min(rowGroup):max(rowGroup)+1)', ...
'VariableNames', {'Group', 'nRows'})