MATLAB: Filtering max values within every 3 hours of data

cell arraysfiltermaximum

Hey everyone,

I attached the first 50 rows of a sample dataset. My time data is in the format of '2000-01-01 01:00' where the last 5 characters hold the hour (HH) and minute (MM) values respectively. I would like to have a code that goes through the time column, and assigns the maximum value of the second column between the 01,02,03 hours to the 03 hour timestep. I need to then have this done for the 04 – 06 time period and so forth until 22 – 00.

At the end the cell array would not need to keep data from other columns and should just be XXXX x 2 cell array and just reassigns the maximums to the 03,06,09 …… 00 timesteps. There may also be missing hours, so the script would need to accommodate for that.

example:

'1984-01-01 00:00'  10.80  240  20.06  231  0.8
'1984-01-01 01:00'  11.83  230  20.06  6  0.6
'1984-01-01 02:00'  11.31  230  18.51  100  0.6
'1984-01-01 03:00'  10.80  230  18.51  212  0.7
'1984-01-01 04:00'  9.77  240  18.65  348  0.8
'1984-01-01 05:00'  9.25  240  16.97  406  0.8
'1984-01-01 06:00'  9.77  230  17.48  524  0.7
'1984-01-01 07:00'  9.77  230  16.97  612  0.7

would then become:

'1984-01-01 03:00'  11.83  
'1984-01-01 06:00'  9.77

I am completely lost in writing this and any help would be appreciated!

Best Answer

You could build an approach along the following line:

 % - Build array of time components. 


 dv = datevec( datatoupload(:,1), 'yyyy-mm-dd hh:MM' ) ;
 % - Build logical index of relevant/boundary hours.
 hrId      = rem( dv(:,4), 3 ) == 0 ;
 hrId(end) = true ;                     % Last entry always relevant.
 % - Build vector of (time-) group IDs.


 hrGrId = cumsum( [true; hrId(1:end-1)] ) ;
 % - Get statistics per group.


 grMax = accumarray( hrGrId, cell2mat(datatoupload(:,2)), [], @max ) ;
 % - Build output.


 output = [datatoupload(hrId,1), num2cell(grMax)] ;

Applied to your MAT file, this gives:

 >> output
 output = 
    '1984-01-01 00:00'    [10.8024]
    '1984-01-01 03:00'    [11.8312]
    '1984-01-01 06:00'    [ 9.7736]
    '1984-01-01 09:00'    [ 9.7736]
    '1984-01-01 12:00'    [ 8.7448]
    '1984-01-01 15:00'    [ 8.7448]
    '1984-01-01 18:00'    [ 9.2592]
    '1984-01-01 21:00'    [ 7.2016]
    '1984-01-02 00:00'    [ 5.1440]
    '1984-01-02 03:00'    [ 4.6296]
    '1984-01-02 06:00'    [ 3.6008]
    '1984-01-02 09:00'    [ 2.5720]
    '1984-01-02 12:00'    [ 7.2016]
    '1984-01-02 15:00'    [10.2880]
    '1984-01-02 18:00'    [11.8312]
    '1984-01-02 21:00'    [12.3456]
    '1984-01-03 00:00'    [10.2880]
    '1984-01-03 01:00'    [10.8024]

EDIT 1: the following solution should work with missing data, but you'll have to understand it and fully test it. In the beginning, I am building a test data set with missing data.

 load( 'masao_1.mat' ) ;
 data = datatoupload ;
 % - Remove rows so for day 1:

 %  > Hours 4,5,6 have only data for hours 5,6.

 %  > Hours 10,11,12 have only data for hours 10, 11. 

 %  > Hours 13,14,15 have no data. 

 data([16,15,14,13,5], :) = [] ;
 % - Build array of time components. 
 dv = datevec( data(:,1), 'yyyy-mm-dd hh:MM' ) ;
 % - Build vector of (time-) group IDs.
 %  1. Absolute hour ID.

 hrId = dv(:,4:-1:1) * cumprod( [1; 24; 32; 12] ) - 1 ;
 %  2. Contiguous group start pos.

 grStart = [true; diff(floor(hrId/3)) ~= 0] ;             
 %  3. Group start pos.
 grEnd = [grStart(2:end); true] ;
 %  4. Contiguous group IDs.
 grId = cumsum( grStart ) ;
 % - Get statistics per group.
 grMax  = accumarray( grId, cell2mat(data(:,2)), [], @max ) ;
 % - Build output.
 output = [data(grEnd,1), num2cell(grMax)] ;

Looking at the output

 >> output
 output = 
    '1984-01-01 00:00'    [10.8024]
    '1984-01-01 03:00'    [11.8312]
    '1984-01-01 06:00'    [ 9.7736]
    '1984-01-01 09:00'    [ 9.7736]
    '1984-01-01 11:00'    [ 8.7448]
    '1984-01-01 18:00'    [ 9.2592]
    '1984-01-01 21:00'    [ 7.2016]
    '1984-01-02 00:00'    [ 5.1440]
    '1984-01-02 03:00'    [ 4.6296]
    '1984-01-02 06:00'    [ 3.6008]
    '1984-01-02 09:00'    [ 2.5720]
    '1984-01-02 12:00'    [ 7.2016]
    '1984-01-02 15:00'    [10.2880]
    '1984-01-02 18:00'    [11.8312]
    '1984-01-02 21:00'    [12.3456]
    '1984-01-03 00:00'    [10.2880]
    '1984-01-03 01:00'    [10.8024]

you see that it seems to be working when missing data hit boundary hours or not, and when they cover full period(s).

EDIT 2: the following solution removes data from groups/periods which don't include 3 entries/measures. Again, you have to fully understand it and test it. You may find an easier way to eliminate entries, I didin't really optimize.

load( 'masao_1.mat' ) ;
data = datatoupload ;
% - Remove rows so for day 1:
%  > Hours 4,5,6 have only data for hours 5,6.
%  > Hours 10,11,12 have only data for hours 10, 11. 
%  > Hours 13,14,15 have no data. 
data([16,15,14,13,5], :) = [] ;
% - Build array of time components. 
dv = datevec( data(:,1), 'yyyy-mm-dd hh:MM' ) ;
% - Build vector of (time-) group IDs.
%  1. Absolute hour ID.
hrId = dv(:,4:-1:1) * cumprod( [1; 24; 32; 12] ) - 1 ;
%  2. Contiguous group start pos.
grStart = [true; diff(floor(hrId/3)) ~= 0] ;             
%  3. Contiguous group IDs, 1st shot, any size taken into account.
grId = cumsum( grStart ) ;
%  4. Flag data which do not belong to a full group (= 3 entries).
grIsValid    = accumarray( grId, ones(size(grId)) ) == 3 ;
entryIsValid = ismember( grId, find(grIsValid) ) ;
%  5. Eliminate invalid entries, rebuild valid group starts and IDs.
data(~entryIsValid,:)    = [] ;
grStart(~entryIsValid,:) = [] ;
grId = cumsum( grStart ) ;
% - Get statistics per group.
grMax = accumarray( grId, cell2mat(data(:,2)), [], @max ) ;
% - Build output.
grEnd  = [grStart(2:end); true] ;
output = [data(grEnd,1), num2cell(grMax)] ;

With that, we obtain..

 >> output
 output = 
    '1984-01-01 03:00'    [11.8312]
    '1984-01-01 09:00'    [ 9.7736]
    '1984-01-01 18:00'    [ 9.2592]
    '1984-01-01 21:00'    [ 7.2016]
    '1984-01-02 00:00'    [ 5.1440]
    '1984-01-02 03:00'    [ 4.6296]
    '1984-01-02 06:00'    [ 3.6008]
    '1984-01-02 09:00'    [ 2.5720]
    '1984-01-02 12:00'    [ 7.2016]
    '1984-01-02 15:00'    [10.2880]
    '1984-01-02 18:00'    [11.8312]
    '1984-01-02 21:00'    [12.3456]
    '1984-01-03 00:00'    [10.2880]

which don't contain summaries when fewer than 3 entries are available.

Best Answer

Related Solutions

MATLAB: How to defining date format

MATLAB: Character to date and time

Related Question