MATLAB: Preprocessing using readtable()

data importMATLABpreprocessingreadtable

I have a CSV file that I read it as a table with 'readtable'. The original CSV file contains timestamps for a wide range of days. In most of the cases, I do not need that massive amount of timestamps available, because my analyze is centred in a shorter period. Let me put an example to explain better my problem:
The CSV file contains the timestamps in column 1. It starts at timestamp=1573377305, and it starts increasing the timestamps without a determined size, the next timestamp can be at 8 seconds, the third at 10 seconds and so on. What I know are the timestamps of my analysis, the beginning and the end, but I don't know the number of rows that correspond with that interval.
For instance: my timestamps
1573377305
1573377312
1573377326
1573377334
1573377349
1573377355
1573377365
1573377373
1573377386
1573377393
1573377404
1573377416
1573377427
1573377433
1573377445
1573377455
1573377465
1573377475
1573377485
Imagine that my timestamp for analysis is from 1573377326 to 1573377433. I don't want to read all the previous and end information with readtable(). In this case, I could do DataLines = [4 15], but it is an illustrative example. Imagine that you have much more than 20 timestamps, and what you know is the timestamp from beginning to end.
If I upload all the data with 'readtable()', it is inefficient as I am loading information that I am not going to use. How can I do to select the precise interval I am going to use before using readtable()? Or how can I do this process more efficient?

Best Answer

Assuming you don't know which rows of the text file correspond to your start and end timestamps, what you want cannot be achieved just ith readtable. There's no way to tell readtable to stop once a specific value has been encountered (even using detectImportOptions). The only way to tell readtable to stop at a specific line is with the DataLines property and you have to know in advance at which line to stop.
You could determine that line by parsing the file yourself line by line with textscan for example but then there'd be no point in using readtable after that.
Unless the file is significantly larger than your range of interest, it's simpler to just read the whole file and then use isbetween to keep the required range.
Otherwise, as said, you'll have to use textscan to parse the file line by line until you encounter your end timestamp. It's significantly more work than readtable, you lose automatic format detection, nice table formating, etc. and because you'll be reading the file line by line may actually be slower than reading the whole file in one go. It would go like this:
fmt = '%D%...'; %replace by actual format of your file. Possibly, you could use detectImportOptions for help here
data = {}
fid = fopen(yourfile, 'rt', 'n', yourfileencoding); %Possibly omit the encoding. readtable detect the file encoding for you.
while ~feof(fid)
line = textscan(fid, fmt, 1); %read one line
%assuming the timestamp is in first column
if line{1} > endtimestamp
break; %end reading
end
if line{1} >= starttimestamp
data = [data; line];
end
end
fclose(fid);