MATLAB: Does Matlab transpose hdf5 data

bughdf5MATLAB

There is an apparent bug in Matlab HDF5 read/write utility that breaks interoperability with other code. Simple array datasets are read/written as the transpose of their actual shape. I imagine this is because Matlab uses column-major (Fortran-style) order, whereas the HDF5 standard uses row-major (C-style) order.
Minimal example that illustrates the problem:
h5create('test.h5', '/dataset', [2,3]);
h5write('test.h5', '/dataset', reshape(1:6,[2,3]))
Running the HDF5 utility h5ls on the output reveals the problem:
$ h5ls test.h5
dataset Dataset {3, 2}
This is not evident if only using the HDF5 tools from within Matlab, since reading the dataset in also transposes it back.
>> h5read('test.h5', '/dataset')
ans =
1 3 5
2 4 6
Matlab should either fix this in future versions or mention the convention in the documentation, since people mostly choose HDF5 for interoperability with other systems, and this can be a tricky bug to find.
In versions:
  • h5ls: Version 1.8.14
  • Matlab 8.6.0.267246 (R2015b) GLNXA64

Best Answer

In the following link:
I read the following under Data Layout:
"Contiguous: The array is stored in one contiguous area of the file. This layout requires that the size of the array be constant"
"The offset of an element from the beginning of the storage area is computed as in a C array."
"The first dimension stored in the list of dimensions is the slowest changing dimension and the last dimension stored is the fastest changing dimension."
So, yes this appears to be clear that the data storage order in the file is "C" array convention, and I can find no options that allow a "Fortran" array convention.
That being said, the dimensions that apparently got stored in the file appear to be correct. I.e., the slowest changing dimension (3) did in fact get stored in the file first, followed by the fastest changing dimension (2). This assumes of course that the data was written into the file in the order 1, 2, 3, 4, 5, 6. So the data appears to be written to the file correctly as far as that goes (i.e., the dimensions stored in the file match the data order in the file). It just didn't get written out in the order you expected. So looks like you would need to manually transpose for 2D (or permute for nD) on the MATLAB side as you suggested if you want the data in the file to look like the "same" dimensions as the MATLAB variable.
Maybe submit a bug report and see what TMW has to say about all this. I don't know if I would classify this as a "bug" per-se since the dimensions and data storage in the file appear to match each other. What I might expect is that MATLAB would match whatever the official Fortran HDF5 interface subroutines do. If the official Fortran API routines do the same thing as MATLAB then I would say MATLAB did it correctly (but should document this behavior). But if the official Fortran API routines permute the data into "C" array storage order, then MATLAB is out of bed with this and I might call it a bug even though the file is written correctly (just didn't match the apparent expectation of the HDF Group). (Maybe contact the HDF Group and ask them that question).