MATLAB: Import too large csv data file with strings

import csv filesstrings

My file is about 72 MB, almost 850000 rows and on average 7 columns, so some times the number of columns changes. Data is mostly comprised of strings so i used the:
as
name= 'etch.csv';
[C1, C2, C3, C4, C5, C6, C7] = csvimport(name, 'columns', [1:7], 'noHeader', true, 'delimiter', ';' );
(i am interested only in the 7 columns even there were cases with more data) This works perfectly for small data sets. For my case it took me almost 30 minutes or even more. Any idea for something better? Thank you
PS My data type is:
1: Device Name,Category,Date,Time,Source,Message,Condition,Name,Act
2: string1,string2,mm/dd/yyyy,hh:mm:ss.sss,string,string,string,1 or 0
…..
850000: and it goes on as line 2
last column most of the times has no data but does not interest me

Best Answer

No matter what, you're bound by the reading speed of matlab. Probably the fastest way to read the file is to rea it all once with fileread. You can then split the lines with strsplit. It is then a choice of applying either of textscan, strsplit or regexp on each line. You would have to see which is faster.
Here is how I would do it using regexp:
filecontent = fileread('etch.csv');
filelines = strsplit(filecontent, {'\r', '\n'}); %split at line ending. Copes with linux and windows termination
fields = regexp(filelines, '^([^;]*);([^;]*);([^;]*);([^;]*);([^;]*);([^;]*);([^;]*);', 'tokens', 'once'); %only keep the first seven fields
fields = vertcat(fields{:})
The above takes about 3 seconds on my machine to read 85000 rows (only 8 MB of text though).
One thing it hasn't done is parse the date. This is fairly trivial to do with datetime if needed and takes no time at all.