MATLAB: How to speed up file reading for large files with floating point numbers in scientific notation

freadfscanfMATLABperformancespeedtextscan

I used "textscan" to read the first two lines and "fscanf" to read all the rest data, then I reshape the data to the matrix size I need, but the process is taking too long. I guess it might be because "fscanf" reads the data one by one. I want know if there is any way to read all of the data at once, or row-by-row? So that it may save a lot of time. For the largest file I have, with about 25 millions rows and 70 columns (cost about 40GB memory), it takes about 20 minutes. Given the significantly large data size, the speed is acceptable, but I still want know if there is any way to boost the speed because there a numerous number files there.
File Example:
-2.950180e-001 column format file
             x               y               z
3.8799936e-002  6.5000001e-003 -1.8333450e-002
3.5799935e-002  1.5000000e-003 -2.0333450e-002
3.9799935e-002  7.5000003e-003 -1.7333451e-002

Best Answer

I duplicated the given example into a 500,000 row data set and recorded the execution times for "fscanf", "textscan", and "fread". 
The results are as follows:
fscanf(fid, '%f'); 24s
cell2mat(textscan(fid, '%f')); 8.6s
sscanf(char(fread(fid)'), '%f'); 10.6s
Based on this experiment, it appears that using the "textscan" or "fread" function for this purpose may provide an improvement to execution time over "fscanf".
If you are looking for additional performance options, you can consider splitting the data into smaller chunks to process on. You can also take a look at using "datastore" or other alternatives for handling large data sets in the link below:
Related Question