MATLAB: Count number of rows in csv outside of matlab

data import

I have 10000+ csv files I would like to import into matlab. I only need the data from the first and last rows for inlet and exit conditions. Each csv file has a different number of data points, so I do not know the length of the file imported a priori. I am trying to automate the import process. I can automate importing all the data or specific lines, but I do not know how to import the last row. The only way I can think of is to determine the number of rows in the file without importing the data (importing all the data takes a few hours) and import that row specifically. Does anyone know how I can do this? I have tried messing with textscan, but I have not had any luck.

Best Answer

Turns out this is not as easy of a question as you might think, especially if your CSV contains data that might be double-quoted and that data contains a new line character.
e.g. "This data\nhas a new line","but this doesn't"
The number of lines might not really be that important, depending on what you're trying to do. But I don't know what that is.
If you're trying to avoid having all the data in memory at one time, I suggest reading up on tabularTextDatastore, as that helps automate working with large sets of data. There's a rich set of features you can use with datastores to make working with larger datasets eaiser, tall, transform, combine. None of these assume to know the size of each table. But again, without knowing what you plan do to with those files, that's hard to say if you can use it.
---- but to answer the original question ----
If you don't have any double-quoted data, it gets a lot easier to count lines of a CSV file. This code will scan the lines without importing any of the data. (It reads the file internally, but doesn't generate any output--thats what the %*... formats are about)
fid = fopen(filename);
numLines = 0;
while ~feof(fid)
[~,c]=textscan(fid,'%*[^\r\n]%*[\r\n]',1,'Delimiter','','Whitespace','','EndOfLine','');
if c > 0 % if c==0, then there wasn't a line there. this may happen at the end of the file.
numLines = numLines + 1;
end
end
fclose(fid);
If you want the data for the lines, this newer function should help:
---- if you have double-quoted strings with new lines (or don't know if you do) ----
If you want to count the actual lines and not count the ones in double quoted fields, then you really need to parse each CSV line and find the fields with double-quotes, and at that point you might as well be importing the data, but if you really just want to count the parsed lines, you can do something with importOptions.
This code will import only the first variable in the table as a string, but to do that, it will still parse the file and consider the quoted data that appears later in the line. It will be slower than the method above, but robust if you have quoted data that contains newlines. It will be faster than bringing in the whole table.
opts = delimitedTextImportOptions('Delimiter',',','ExtraColumnsRule','ignore','VariableTypes',"string");
T = readtable(filename,opts);
numLines = height(T);