MATLAB: How to read multiple huge text files, the fastest way

fast file readfaster read filefastestfile reading matlabhuge input data

Hi All,
I am quite new to Matlab and sorry for the naive question. Request your kind help on my problem as given below.
I have around 10,000 input text files to read and process in Matlab. The text file contains only numerical data but each file is around 12-15MB and hence the total size of the input is around 125~150GB.
First, I tried to use fgetl() to read each line from each file and iterate but it took very long. So I have modified the input text file format as a set of numbers separated by white spaces and used fscanf() to read to a matrix of size [1 inf]. Still it takes couple of hours to read all the 10,000 files.
I have tried to use parfor loop and ran the code in a matlabpool of cluster size 8 (the system is a linux server – 4 processors, each dual code). Even then, it takes more than 2 hours to read all the files.
Could anyone kindly let me know what is the fastest way to read this much huge data in Matlab? My requirement is to read this much data (125~150 GB) in a couple of minutes.
Note: I can change the format of the input text file to achieve the highest possible file read. But I would like to read the inputs as numbers only (not string) as during processing str2double() takes much time.
Thanks a million in advance. Expecting your expert advice.
Warm Regards
Anand Uthaman

Best Answer

If you have total control over the file format, storing the data in a binary file format would make reading the data out of the file much faster.
Related Question