MATLAB: Big Problem/Bug with new matfile command for partial mat file read/writes – creates massivly bloated files.

Please look at this minimal example:
%create a 1mb "incompressible" array
one_meg = uint8(rand(1,1000,1000)*256);
%choose a file, clear it and open it with write access
testfile = 'D:\Data\PGRtest\testfile.mat';
system(['del "' testfile '"'] );
matObj = matfile(testfile,'Writable',true);
%keep a copy of what we write to the file in memory for verification
memcpy = zeros(50,1000,1000,'uint8');
%write the array 50 times to this file
for i = 1:50
%store in file and memory in same format - pages of 1000x1000
matObj.RawDat(i,1:1000,1:1000) = one_meg;
memcpy(i,1:1000,1:1000) = one_meg;
tm = toc;
%time increases from 45ms to 250ms at last iteration
fprintf('Iteration %i, time taken: %ims\n',i,tm*1000);
%check file size - should be 50mb or smaller from compression
%the file size is 1200mb....?
s = dir(testfile);
fprintf('file size: %i mb\n', s.bytes/1024/1024);
%load the mat file
%the data inside is 50mb as expected no where near 1200mb
%the read data is equal to the memory copy.. where did all that extra space go?
This is using Windows 7 64bit, Matlab 2011b 64bit.
The problem is mostly described in the comments – essentially why does 50mb of data create a 1200mb mat file when created using the matfile system object?
I have tried storing the data with 2 dimensions instead of 3 I have tried using doubles not uint8. I have tried changing the default .mat file format from 7.3 although this is the only version that supports it.
I cant understand why it takes longer and longer – it is as if each write to the file rewrites all the existing data a second time so the first write is 1mb then 2mb then 3mb etc instead of 1mb each time.
I expect 'testfile' to be a <50mb mat file containing a 50x1000x1000 array. What I see is a 1.2GB file containing that array – clearly incorrect.
If the array is saved directly from workspace using 'save' the mat file is 2mb containing the same data.
Looks like this is a bug.
Any ideas? Do you get the same results? Thanks, Tom.

Best Answer

For the same reasons that growing an array in memory is a bad idea growing an array in a matfile is not a good programming practice. Your file has been horribly fragmented because of the matrix growth. The full 3d matrix must occupy one linear segment of the file.
If you preallocate the file variable by adding the line:
matObj.RawDat=memcpy; %preallocate
after creating the memcpy variable then your file size will be reasonable.
If your code is a model of what you want to do I suggest storing your chunks of data in cells of a RawData cell array inside your file.
You are also indexing into your array inefficiently but that does not seem to be causing any performance issues. For MATLAB it would be best if RawData was (1000,1000,50) in size.