MATLAB: How to speed up file reading for large files with floating point numbers in scientific notation

freadfscanfMATLABperformancespeedtextscan

I used "textscan" to read the first two lines and "fscanf" to read all the rest data, then I reshape the data to the matrix size I need, but the process is taking too long. I guess it might be because "fscanf" reads the data one by one. I want know if there is any way to read all of the data at once, or row-by-row? So that it may save a lot of time. For the largest file I have, with about 25 millions rows and 70 columns (cost about 40GB memory), it takes about 20 minutes. Given the significantly large data size, the speed is acceptable, but I still want know if there is any way to boost the speed because there a numerous number files there.

File Example:

-2.950180e-001 column format file
             x               y               z
3.8799936e-002  6.5000001e-003 -1.8333450e-002
3.5799935e-002  1.5000000e-003 -2.0333450e-002
3.9799935e-002  7.5000003e-003 -1.7333451e-002

Best Answer

I duplicated the given example into a 500,000 row data set and recorded the execution times for "fscanf", "textscan", and "fread".

The results are as follows:

fscanf(fid, '%f');                     24s 
cell2mat(textscan(fid, '%f'));         8.6s 
sscanf(char(fread(fid)'), '%f');       10.6s

Based on this experiment, it appears that using the "textscan" or "fread" function for this purpose may provide an improvement to execution time over "fscanf".

If you are looking for additional performance options, you can consider splitting the data into smaller chunks to process on. You can also take a look at using "datastore" or other alternatives for handling large data sets in the link below:

https://www.mathworks.com/help/matlab/large-files-and-big-data.html

Related Solutions

MATLAB: Cannot Read data from text file

FileID = fopen('data.txt','r');
C = textscan(FileID,'%s %f %f','HeaderLines',1,'Delimiter',',');
fclose(FileID);
C{1}
C{2}
C{3}

MATLAB: Read colomn data from file

Try

    TheData = cssm();

where

    function    TheData = cssm()
        FormatStr = repmat( '%f', 1,12 );
        FileId = fopen('b.txt');
        DataCell = textscan(FileId, FormatStr ...
            ,   'HeaderLines', 39, 'CollectOutput', true );
        fclose(FileId);
        TheData = DataCell{1};
    end

The problem with your code has something to do with the delimiter being "one space" or "one or many spaces". And no need (/better not) to include space in the format-string. Default takes care of it. See the documentation.

Best Answer

Related Solutions

MATLAB: Cannot Read data from text file

MATLAB: Read colomn data from file

Related Question