MATLAB: How to use memmapfile for a very large structured binary file

beerlarge binary fileMapping ToolboxMATLABmemmapmemmapfilereadfields

Hello:
I need to process a 62 GB structured binary file written from a 24 h simulation. The structure of the file is as follows:
ft(1).length = 1; ft(1).type = 'integer*4'; ft(1).name = 'VehID';
ft(2).length = 1; ft(2).type = 'real*4'; ft(2).name = 'Time';
ft(3).length = 1; ft(3).type = 'integer*4'; ft(3).name = 'Longitude';
ft(4).length = 1; ft(4).type = 'integer*4'; ft(4).name = 'Latitude';
ft(5).length = 1; ft(5).type = 'integer*2'; ft(5).name = 'Heading';
ft(6).length = 1; ft(6).type = 'integer*4'; ft(6).name = 'Segment';
ft(7).length = 1; ft(7).type = 'integer*2'; ft(7).name = 'Dir';
ft(8).length = 1; ft(8).type = 'integer*4'; ft(8).name = 'Lane';
ft(9).length = 1; ft(9).type = 'real*4'; ft(9).name = 'Offset';
ft(10).length = 1;ft(10).type = 'real*4'; ft(10).name = 'Distance';
ft(11).length = 1;ft(11).type = 'real*4'; ft(11).name = 'Speed';
ft(12).length = 1;ft(12).type = 'real*4'; ft(12).name = 'Acceleration';
I am able to read this file using readfields with the format above but it is taking forever to go through its 1,506,979,651 records. I would like to partition this file in 96 files based on the value of 'Time', which covers 24 hours (15 min increments -> 96 files), and keep only VehID, Time, Distance, Speed, and Acceleration. After extensive readings (I am still learning Matlab), I understand memmapfile would be a good way to go, but I am unable to make that command work. I would need help to write the appropriate memmapfile statement (especially the format) so I can process this file efficiently. Thank you for your help,
JDS

Best Answer

This gave me a chance to try a complicated format. Result:
filespec = 'usgsdems.dat'; % A sample file I found in the Map Toolbox
n_repeat = 24*60/15;
nday = 1;
N = (nday-1) * n_repeat * sum([ 4, 4, 4, 4, 2, 4, 2, 4, 4, 4, 4, 4 ]);
%
mmp = memmapfile( filespec ...
, 'Offset' , N ...
, 'Format', {
'int32' , [1,1], 'VehID'
'single', [1,1], 'Time'
'int32' , [1,1], 'Longitude'
'int32' , [1,1], 'Latitude'
'int16' , [1,1], 'Heading'
'int32' , [1,1], 'Segment'
'int16' , [1,1], 'Dir'
'int32' , [1,1], 'Lane'
'single', [1,1], 'Offset'
'single', [1,1], 'Distance'
'single', [1,1], 'Speed'
'single', [1,1], 'Acceleration'
} ...
, 'Repeat', n_repeat );
>> mmp.Data(1).VehID
ans =
1701994860 % garbage but indicates the syntax is correct
>> mmp.Data(2).VehID
ans =
538976313
>> mmp.Data(n_repeat).VehID
ans =
538976288
However,
>> mmp.Data(2:3).VehID
Error using memmapfile/subsref (line 782)
A subscripting operation on the Data field attempted to create a comma-
separated list. The memmapfile class does not support the use of comma-
separated lists when subscripting.
&nbsp
"and keep only VehID, Time, Distance, Speed, and Acceleration"
AFAIK: The new files must be written line by line. Include only the fields, which shall be kept.
I'm not convinced the process will be fast.