MATLAB: Matfile runs incredibly slowly on large files–what might be the problem


I have a matlab file that contains one variable, a 64000×31250 array of singles. I use matfile to pull single columns out of that array. I've done similar operations on smaller (say 7000×31250) arrays and had it work fine. However, with this matrix, each column read takes 20!!!! seconds. In the profiler, essentially all of the time is taken on matfile.m's line 460:
[varargout{1:nargout}] = internal.matlab.language.partialLoad(obj.Properties.Source, varSubset, '-mat');
all this work (saving, matfile'ing, etc.) is done in 2012B and in 7.3 file format.
To set the performance scale, reading in the entire variable with a load command takes 127 seconds (ie less than the time matfile is taking to read 7 of the 31250 columns).
edit: a few details I should have included: 24 gigs ram, windows 7 x64, CPU is i7-950 (4 cores, 8 with hyperthreading), disk activity is very, very low during this process, but a single core is running at max speed (ie, one matlab process is using 13% CPU on the "8 core" CPU throughout.
Any ideas why matfile is choking so badly?

Summary: "column-major" does not apply to the class when it comes to reading speed.
Column-major or row-major?
Doc on hdf5read says:
[...]HDF5 describes data set dimensions in row-major order; MATLAB stores
data in column-major order. However, permuting these dimensions may not
correctly reflect the intent of the data and may invalidate metadata. When
BOOL is false (the default), the data dimensions correctly reflect the data
ordering as it is written in the file each dimension in the output variable
matches the same dimension in the file.
Matlab uses column-major order and HDF5 uses row-major order. The MAT-file 7.3 file format "is" HDF5.
The following test ( R2012a 64bit, 8GB, Windows 7) shows that for a .<5000x5000 single>:
  • reading one column takes approximately half the time compared to reading the full matrix
  • reading one row is approx 20 times faster than reading one column.
In this case the matrix is so small that my 8GB should not be a bottleneck.
N = 5e3;
filespec = 'matfile_test.mat';
mat = rand( N, 'single' );
save( filespec, 'mat', '-v7.3' )
obj = matfile( filespec );
tic, mfm = obj.mat; toc
tic, h5m = h5read( filespec, '/mat' ); toc
dfm = mfm-mat;
d5m = h5m-mat;
tic, mfm = obj.mat( :, 1 ); toc
tic, h5m = h5read( filespec, '/mat', [1,1], [N,1] ); toc
dfm = mfm-mat( :, 1 );
d5m = h5m-mat( :, 1 );
tic, mfm = obj.mat( 1, : ); toc
tic, h5m = h5read( filespec, '/mat', [1,1], [1,N] ); toc
dfm = mfm-mat( 1, : );
d5m = h5m-mat( 1, : );
Elapsed time is 1.955082 seconds.
Elapsed time is 1.674106 seconds.
Elapsed time is 0.984833 seconds.
Elapsed time is 0.822843 seconds.
Elapsed time is 0.056097 seconds.
Elapsed time is 0.029657 seconds.
2013-07-24: Test with R2013a 64bit, 8GB, Windows 7; same computer, same OS, and new Matlab release. The results below are from the third run of the script after restarting the computer and Matlab. There is a little improvement in speed. However, nothing comparable with the result of reading a row, which Matt J report in the comment.
>> matfile_h5_script
Elapsed time is 2.626919 seconds.
Elapsed time is 1.219851 seconds.
Elapsed time is 0.809362 seconds.
Elapsed time is 0.765147 seconds.
Elapsed time is 0.049908 seconds.
Elapsed time is 0.020192 seconds.
