MATLAB: Multiple selection of data (dates,mon​ths,days,h​ours)

#dataselection

Hi! I have a csv file and I want to extract a range of data. The form of the data is the following(see comments). As I am a newbie in Matlab, I don't know how to use correctly the "for"loop. My aim is to select data by the year,month,day,hour and create a new table. The selected data must have the following form;
Example : [2004-01-01 00:00:00.0/2004-01-01 01:00:00.0
2005-01-01 00:00:00.0/2005-01-01 01:00:00.0
2006-01-01 00:00:00.0/2006-01-01 01:00:00.0 …..]
I tried to split the dates using the T=split(t,{'-','T','/'}); and I also selected only the start dates (only the part before the " / "sign in the example ["2004-01-01 00:00:00.0"]) as the end dates aren't necessary, but once again my problem is that I have a difficultly to extract the data.
Please Help !!!!

Best Answer

Follow these steps to prepare in your data more efficiently and in a user-friendly format.
Use readtable to read in the data.
filePath = 'C:\Users\name\Documents\MATLAB'; % Path to your csv file
fileName = 'κοζάνη.csv';
file = fullfile(filePath, fileName);
opts = detectImportOptions(file);
T = readtable(file,opts);
Let's look at T using head() which shows us the first few rows
>> head(T)
ans =
8×11 table
x_ObservationPeriod TOA ClearSkyGHI ClearSkyBHI ClearSkyDHI ClearSkyBNI GHI BHI DHI BNI Reliability
_______________________________________________ ______ ___________ ___________ ___________ ___________ ______ ______ ______ ______ ___________
{'2004-02-01T00:00:00.0/2004-02-01T01:00:00.0'} 0 0 0 0 0 0 0 0 0 1
{'2004-02-01T01:00:00.0/2004-02-01T02:00:00.0'} 0 0 0 0 0 0 0 0 0 1
{'2004-02-01T02:00:00.0/2004-02-01T03:00:00.0'} 0 0 0 0 0 0 0 0 0 1
{'2004-02-01T03:00:00.0/2004-02-01T04:00:00.0'} 0 0 0 0 0 0 0 0 0 1
{'2004-02-01T04:00:00.0/2004-02-01T05:00:00.0'} 0 0 0 0 0 0 0 0 0 1
{'2004-02-01T05:00:00.0/2004-02-01T06:00:00.0'} 5.4901 1.8642 0.5659 1.2983 16.531 1.789 0.4964 1.2926 14.457 1
{'2004-02-01T06:00:00.0/2004-02-01T07:00:00.0'} 175.57 104.59 71.474 33.111 516.95 97.043 61.139 35.905 408.39 1
{'2004-02-01T07:00:00.0/2004-02-01T08:00:00.0'} 397.79 285.44 228.68 56.764 799.67 268.38 203.61 64.77 710.04 1
Notice that the x_ObservationPeriod show what appears to be a START / STOP period of observation but they are in character format. Yuk!
Split the x_ObservationPeriod into start / stop times and convert to datetime format.
This will take a few seconds because you've got 140256 rows of data.
% Split the start/stop times in character format
startStop = cellfun(@(c)strsplit(c, '/'), T.x_ObservationPeriod,'UniformOutput',false);
% Convert to table (2 columns)
startStopTbl = array2table(vertcat(startStop{:}));
% Convert values from char to datetime
startStopTblDt = varfun(@(v)datetime(v, 'InputFormat', 'yyyy-MM-dd''T''HH:mm:ss.S'), startStopTbl);
% Name the columns
startStopTblDt.Properties.VariableNames = {'ObservationStart', 'ObservationStop'};
Let's look at startStopTblDt
>> head(startStopTblDt)
ans =
8×2 table
ObservationStart ObservationStop
____________________ ____________________
01-Feb-2004 00:00:00 01-Feb-2004 01:00:00
01-Feb-2004 01:00:00 01-Feb-2004 02:00:00
01-Feb-2004 02:00:00 01-Feb-2004 03:00:00
01-Feb-2004 03:00:00 01-Feb-2004 04:00:00
01-Feb-2004 04:00:00 01-Feb-2004 05:00:00
01-Feb-2004 05:00:00 01-Feb-2004 06:00:00
01-Feb-2004 06:00:00 01-Feb-2004 07:00:00
01-Feb-2004 07:00:00 01-Feb-2004 08:00:00
Much better format. You can update the output format using this example,
startStopTblDt.ObservationStart.Format = 'MMM dd, yyyy HH:mm:ss';
Combine the two tables
T = [startStopTblDt, T];
View the results (first 10 rows, 6 columns)
>> T(1:10,1:6)
ans =
10×6 table
ObservationStart ObservationStop x_ObservationPeriod TOA ClearSkyGHI ClearSkyBHI
____________________ ____________________ _______________________________________________ ______ ___________ ___________
01-Feb-2004 00:00:00 01-Feb-2004 01:00:00 {'2004-02-01T00:00:00.0/2004-02-01T01:00:00.0'} 0 0 0
01-Feb-2004 01:00:00 01-Feb-2004 02:00:00 {'2004-02-01T01:00:00.0/2004-02-01T02:00:00.0'} 0 0 0
01-Feb-2004 02:00:00 01-Feb-2004 03:00:00 {'2004-02-01T02:00:00.0/2004-02-01T03:00:00.0'} 0 0 0
01-Feb-2004 03:00:00 01-Feb-2004 04:00:00 {'2004-02-01T03:00:00.0/2004-02-01T04:00:00.0'} 0 0 0
01-Feb-2004 04:00:00 01-Feb-2004 05:00:00 {'2004-02-01T04:00:00.0/2004-02-01T05:00:00.0'} 0 0 0
01-Feb-2004 05:00:00 01-Feb-2004 06:00:00 {'2004-02-01T05:00:00.0/2004-02-01T06:00:00.0'} 5.4901 1.8642 0.5659
01-Feb-2004 06:00:00 01-Feb-2004 07:00:00 {'2004-02-01T06:00:00.0/2004-02-01T07:00:00.0'} 175.57 104.59 71.474
01-Feb-2004 07:00:00 01-Feb-2004 08:00:00 {'2004-02-01T07:00:00.0/2004-02-01T08:00:00.0'} 397.79 285.44 228.68
01-Feb-2004 08:00:00 01-Feb-2004 09:00:00 {'2004-02-01T08:00:00.0/2004-02-01T09:00:00.0'} 574.58 440.45 372.5
01-Feb-2004 09:00:00 01-Feb-2004 10:00:00 {'2004-02-01T09:00:00.0/2004-02-01T10:00:00.0'} 693.92 546.61 472.8
If you want to remove the x_ObservationPeriod column,
T.x_ObservationPeriod = [];
Add row numbers to the table that correspond to row number in csv file
rowNums = (0:size(T,1)-1) + opts.DataLines(1);
T.Properties.RowNames = compose('%d',rowNums);
View results
>> T(1:10,1:6)
ans =
10×6 table
ObservationStart ObservationStop x_ObservationPeriod TOA ClearSkyGHI ClearSkyBHI
____________________ ____________________ _______________________________________________ ______ ___________ ___________
44 01-Feb-2004 00:00:00 01-Feb-2004 01:00:00 {'2004-02-01T00:00:00.0/2004-02-01T01:00:00.0'} 0 0 0
45 01-Feb-2004 01:00:00 01-Feb-2004 02:00:00 {'2004-02-01T01:00:00.0/2004-02-01T02:00:00.0'} 0 0 0
46 01-Feb-2004 02:00:00 01-Feb-2004 03:00:00 {'2004-02-01T02:00:00.0/2004-02-01T03:00:00.0'} 0 0 0
47 01-Feb-2004 03:00:00 01-Feb-2004 04:00:00 {'2004-02-01T03:00:00.0/2004-02-01T04:00:00.0'} 0 0 0
48 01-Feb-2004 04:00:00 01-Feb-2004 05:00:00 {'2004-02-01T04:00:00.0/2004-02-01T05:00:00.0'} 0 0 0
49 01-Feb-2004 05:00:00 01-Feb-2004 06:00:00 {'2004-02-01T05:00:00.0/2004-02-01T06:00:00.0'} 5.4901 1.8642 0.5659
50 01-Feb-2004 06:00:00 01-Feb-2004 07:00:00 {'2004-02-01T06:00:00.0/2004-02-01T07:00:00.0'} 175.57 104.59 71.474
51 01-Feb-2004 07:00:00 01-Feb-2004 08:00:00 {'2004-02-01T07:00:00.0/2004-02-01T08:00:00.0'} 397.79 285.44 228.68
52 01-Feb-2004 08:00:00 01-Feb-2004 09:00:00 {'2004-02-01T08:00:00.0/2004-02-01T09:00:00.0'} 574.58 440.45 372.5
53 01-Feb-2004 09:00:00 01-Feb-2004 10:00:00 {'2004-02-01T09:00:00.0/2004-02-01T10:00:00.0'} 693.92 546.61 472.8
Now you can see that the first row of data comes from line 44 of the csv file.