MATLAB: How to split a sequence based on values from one variable

anomalyarraycellcell arrayMATLABpredictorpredictorsresponsesequencestatetablevariable

Good evening,
I can't figure out how to solve the following problem.
Assuming that I have a dataset as in the picture, I would like to divide it into many smaller datasets using the variable "State" and keeping the sequence. Actually the real dataset has more than 200000 observations so I can't know when the variable State changes from NORMAL to RECOVERY and vice versa, but I would like to split the dataset into many mini sequences where each one has the same State variable for all the observations.
Then, I would need to divide the variables into a Predictors set (varaibles Sensor 1, Sensor 2, Sensor 3) and a Response set (variable State).
If we take, as an example, the image, at the end of the problem I would like to have for the Predictors a cell array of size Nx1 (N equal to the number of mini sequences) with the first cell of size 3×2 (the three features and the first two observations), the second cell of size 3×2, the third cell of size 3×1 and so on. Correspondingly, for the Response I would like to have an Nx1 cell array where the first cell is of dimension 1×2, the second is 1×2, the third is 1×1 and so on.
The problem is that with a dataset of 200000 observations I don't know what kind of loop to use and how to use it.
Thank you!

Best Answer

See the following example.
First create an example table
data = {1, 2, 3, 'norm'; 2, 3, 4, 'norm';
2, 3, 1, 'rec' ; 4, 4, 2, 'rec';
1, 2, 3, 'norm'; 2, 3, 4, 'rec';
2, 3, 1, 'rec' ; 4, 4, 2, 'rec'};
t = cell2table(data, 'VariableNames', ...
{'sen1', 'sen2', 'sen3', 'state'}); % an example table
Result
t =
8×4 table
sen1 sen2 sen3 state
____ ____ ____ ________
1.00 2.00 3.00 {'norm'}
2.00 3.00 4.00 {'norm'}
2.00 3.00 1.00 {'rec' }
4.00 4.00 2.00 {'rec' }
1.00 2.00 3.00 {'norm'}
2.00 3.00 4.00 {'rec' }
2.00 3.00 1.00 {'rec' }
4.00 4.00 2.00 {'rec' }
Then run the following code to split the data
idx = findgroups(t.state);
partition_idx = [1; find(diff(idx)~=0)+1; size(data,1)];
partition_idx = discretize(1:size(data,1), partition_idx);
sensor_val = splitapply(@(x) {x}, table2cell(t(:,1:3)), partition_idx.');
state_val = splitapply(@(x) {x}, table2cell(t(:,4)), partition_idx.');
sensor_val and sensor_val are cell arrays containing the required values.