MATLAB: Possible bug in H5D.write, truncation of VLEN strings

h5d.writehdf5vlen string

Hello,
I have discovered a potential bug, or at least some flaky behavior when using the low level HDF5 write function. When I try to write a long string as a variable length string, it seems to get truncated at 512 bytes (511 + the terminating null). I can write it just fine as a fixed length string.
The minimal script below reproduces the error. I see this on R2012a on both Linux and Mac. Am I missing a parameter or function call that controls the VLEN buffer size, or is something improperly hard coded in the underlying mex function?
Cheers, Souheil
————-
% Create a long string
str = repmat('Hello from matlab. ',[1 1000]);
fprintf('Size of string = %d\n',length(str));
% Create an HDF5 file
filename = 'vlen_string_bug.h5';
fid = H5F.create(filename,'H5F_ACC_TRUNC','H5P_DEFAULT','H5P_DEFAULT');
% Write to a dataset as a variable length string
VLstr_type = H5T.copy('H5T_C_S1');
H5T.set_size(VLstr_type,'H5T_VARIABLE');
space = H5S.create_simple(1, 1, []);
dset = H5D.create(fid, 'VLstr', VLstr_type, space, 'H5P_DEFAULT');
fprintf('Size of VLEN_BUF before = %d\n',H5D.vlen_get_buf_size(dset, VLstr_type, space));
H5D.write(dset, VLstr_type, 'H5S_ALL', 'H5S_ALL', 'H5P_DEFAULT', {str});
fprintf('Size of VLEN_BUF after = %d\n',H5D.vlen_get_buf_size(dset, VLstr_type, space));
H5T.close(VLstr_type);
H5S.close(space);
H5D.close(dset);
% Write to a dataset as a fixed length string
Fstr_type = H5T.copy('H5T_C_S1');
H5T.set_size(Fstr_type, length(str));
space = H5S.create_simple (1, 1, []);
dset = H5D.create (fid, 'Fstr', Fstr_type, space, 'H5P_DEFAULT');
H5D.write(dset, Fstr_type, 'H5S_ALL', 'H5S_ALL', 'H5P_DEFAULT', str);
H5T.close(Fstr_type);
H5S.close(space);
H5D.close(dset);
% Close the file
H5F.close(fid);
% Read the strings back in using the high level read function
t = h5read(filename,'/VLstr');
vlstr = t{1};
fprintf('Size of VLEN string on disk = %d\n',length(vlstr));
t = h5read(filename,'/Fstr');
fstr = t{1};
fprintf('Size of fixed string on disk = %d\n',length(fstr));

Best Answer

Looks like this is a bug in the R2012a mex files on mac and linux. It seems that R2012b resolves it. Thanks for everyone's input.