MATLAB: How to read rows of data in .csv file and turn them into arrays

blockcsvperformancespeedsscanftextscan

I have some data in a csv file that consists of some "delays" and for each delay there is a row of "wavelengths" and a row of corresponsing "intensities". It looks like this in excel:
I want the code to take all the delays and make them into one vector. And take the wavelengths (all the wavelength rows are identical) and make them into one vector. And take the intensities and make an array of intensities.
I tried textscan but that seems column oriented. I need matlab to read line by line (row by row), extract the data and put it into relevant array and then repeat procedure for every delay.

Best Answer

Notes to consider when importing a large file:
  • Importing as character and then doing any kind of text manipulation before converting to numeric will be very inefficient for a large file.
  • Importing the file in blocks is only efficient for a relatively small number of very large data blocks. This is not the case with your data: your file has many small blocks, forcing you to access the file repeatedly and expand the output array many times.
  • Regardless of what method you use to import it, the entire imported file data must fit into MATLAB memory. If this is not the case then you will need to use tall arrays or some other approach to processing your data.
  • Regardless of what method you use to import it, importing data will take some time.
  • If you want to import a large file efficiently then you need to minimize any handling of the imported file data, the optimal solution is to get all of the importing and conversion to numeric performed in one call to one reasonably low-level MATLAB command (i.e. avoid readtable , readmatrix, et al, which are just complex wrappers around textscan).
Method one: textscan. With suitable options to read that file format (note how ':' is defined as the EOL character, so each data block become one line of the imported matrix), you only need to call textscan once and it will import the entire file:
[fid,msg] = fopen('20200107LaAlO3.csv','rt');
assert(fid>=3,msg)
%% Detect the number of columns:
fgetl(fid); % ignore first line.
nmc = 1+nnz(fgetl(fid)==','); % number of columns.
fmt = ['%f%f',repmat('%f',1,2*nmc),'%*s'];
%% Import the file data:
opt = {'HeaderLines',1, 'EndOfLine',':', 'MultipleDelimsAsOne',true,...
'Whitespace',', \n\r\t', 'CollectOutput',true}; % COLLECTOUTPUT is optional
frewind(fid)
C = textscan(fid,fmt,opt{:});
fclose(fid);
%% Optional: split into separate arrays:
D = C{1}(:,1:2); % Delays

W = C{1}(:,3:2+nmc); % Wavelengths
I = C{1}(:,3+nmc:end); % Intensities
And checking the output:
>> D % Delays
D =
-20 -19.999
-19.985 -19.982
-19.97 -19.967
-19.955 -19.952
-19.94 -19.937
-19.925 -19.922
-19.91 -19.907
-19.895 -19.892
-19.88 -19.877
-19.865 -19.862
>> W(:,[1:5,end-5:end]) % Wavelengths (first and last few columns only)
ans =
189.25 189.71 190.18 190.65 191.12 1098.3 1098.7 1099.1 1099.5 1099.9 1100.4
189.25 189.71 190.18 190.65 191.12 1098.3 1098.7 1099.1 1099.5 1099.9 1100.4
189.25 189.71 190.18 190.65 191.12 1098.3 1098.7 1099.1 1099.5 1099.9 1100.4
189.25 189.71 190.18 190.65 191.12 1098.3 1098.7 1099.1 1099.5 1099.9 1100.4
189.25 189.71 190.18 190.65 191.12 1098.3 1098.7 1099.1 1099.5 1099.9 1100.4
189.25 189.71 190.18 190.65 191.12 1098.3 1098.7 1099.1 1099.5 1099.9 1100.4
189.25 189.71 190.18 190.65 191.12 1098.3 1098.7 1099.1 1099.5 1099.9 1100.4
189.25 189.71 190.18 190.65 191.12 1098.3 1098.7 1099.1 1099.5 1099.9 1100.4
189.25 189.71 190.18 190.65 191.12 1098.3 1098.7 1099.1 1099.5 1099.9 1100.4
189.25 189.71 190.18 190.65 191.12 1098.3 1098.7 1099.1 1099.5 1099.9 1100.4
>> I(:,[1:5,end-5:end]) % Intensities (first and last few columns only)
ans =
3802 2566 2801 2749 2918 2938 2931 2756 2760 2758 2756
3805 2570 2806 2748 2922 2944 2917 2754 2755 2755 2754
3806 2569 2800 2751 2925 2960 2918 2757 2760 2758 2756
3805 2564 2802 2747 2924 2946 2915 2754 2757 2753 2757
3804 2568 2801 2744 2928 2937 2914 2754 2759 2758 2760
3805 2568 2807 2748 2935 2939 2928 2752 2762 2756 2756
3802 2569 2802 2752 2933 2936 2910 2755 2758 2755 2757
3803 2568 2804 2747 2927 2953 2920 2754 2761 2752 2758
3803 2571 2802 2746 2923 2936 2927 2752 2759 2757 2762
3810 2567 2807 2749 2931 2945 2918 2757 2755 2758 2760
>>
It might be more efficient to not use the 'CollectOutput' option, and concatenate the vectors afterwards, I leave that up to you to experiment with (hint: use comma-separated lists or cell2mat).
Method two: sscanf. This is by far the fastest method that I found to import that file data, helped by the fact that sscanf is as low-level as it gets. Note that the imported character data is not manipulated, split, or altered in any way before being converted by sscanf:
str = fileread('20200107LaAlO3.csv');
nmb = nnz(str==':'); % number of blocks
nmc = 1+(nnz(str==',')/nmb-1)/2; % number of columns
fmt = repmat(',%f',1,nmc);
fmt = ['%*[\n\r d]elay:%f,%f',fmt(2:end),fmt(2:end)];
mat = sscanf(str,fmt,[2+2*nmc,nmb]).';
The output matrix is exactly identical to the output of Method one:
>> isequal(mat,C{1})
ans =
1
But it is faster:
Elapsed time is 0.599085 seconds. % Method one
Elapsed time is 0.111845 seconds. % Method two
Caveat: As per isakson noted at the end of their answer, importing this file data will lose information as the file contains more digits precision than can be stored in a Double floating point number (but as this is only at the 18th/19th digit this is very unlikely to be a problem).
Related Question