MATLAB: Reading complicated mixed text/numbers file

fscanf file strcmp

I would like to read a vtu file containing the solution of a problem in Matlab.

In particular, I'd like to get the size of the data I want to read, which is given at the beginning of my file by the variable "NumberOfPoints" in this piece of file

<VTKFile type="UnstructuredGrid"  version="0.1"  >
<UnstructuredGrid>
<Piece  NumberOfPoints="5101" NumberOfCells="10000">
<Points>

Also, the data that I'd like to import in Matlab are preceded by

<DataArray  type="Float64"  Name="u"  format="ascii">1.0000000000000000e+00  2.0000000000000000e+00

At the moment I can read them only if I put my data on a new line in the file, i.e.

<Piece  NumberOfPoints>
5101
NumberOfCells="10000">

and

<DataArray  type="Float64"  Name="u"  format="ascii">
1.0000000000000000e+00  2.0000000000000000e+00

using this code

file = fopen( fileName, 'rt' );
while (~feof( file ))
    str = fgets( file );
    str = strtrim(str);
    switch (str)
        case '<Piece  NumberOfPoints>'
            n = fscanf( file, '%d', 1 )
        case '<DataArray  type="Float64"  Name="u"  format="ascii">'
            val = fscanf( file, '%f', [1, n] )';
    end
end
fclose( file );

How can I get the values without modifying my files by hand? I have a lot of files with very big size and this procedure takes long time.

Thank you,

Elisa

Best Answer

Hi Elisa,

I understand that you are trying to read the data points from a VTU file with a specific format without having to modify the file by hand. I am assuming that all of the data points are contained on the line starting with the "DataArray" tag. There are many different ways of parsing the file, so I'll give you a couple of approaches.

The first approach is very similar to your code, but avoids using "switch" to check for the line of interest. Switch-case constructs will only work for exact matches, but you want to know if a particular string is only part of the file line. The "strfind" function, among others, will look for the specified substring within the given string. You could also use the "strncmp" function if you would prefer that.

Also, since the data you are interested in is on the same line as the substring that specifies it as the line of interest, you cannot use "fscanf" to parse that line. If you always know that "NumberOfPoints" will be the first attribute in the "Piece" tag, you can use the "strsplit" function to extract the number of data points you want. You can use similar methods to extract the data points from the "DataArray" line.

file = fopen( fileName, 'rt' );
while (~feof( file ))
    str = fgets( file );
    str = strtrim( str );
    if strfind( str, '<Piece  NumberOfPoints=' )
        strPieces = strsplit( str, '"' );             % Split at the double-quote marks
        n = str2double( strPieces{2} );                   % Convert to number
    elseif strfind( str, '<DataArray  type=' )
        strPieces = strsplit( str, '>' );             % Split at the end of the tag
        val = sscanf( strPieces{end}, '%f', [1 n] );         % Read in data
    end
end
fclose( file );

One of the issues with this approach, however, is that it is not very robust for files of slightly different formats. For example, if there were only a single space between "Piece" and "NumberOfPoints", it would be enough to ensure that this code will never find the value for "n". A much more robust approach would be to use regular expressions . These can be tricky to work with, but they allow for more flexibility in the file format.

file = fopen( fileName, 'rt' );
while (~feof( file ))
    str = fgets( file );
    % The token of interest must have one or more digits, and only digits
    strTokens = regexp( str, 'NumberOfPoints="(\d+)"', 'tokens' );
    if ~isempty( strTokens )
        n = str2double( strTokens{1}{1} );
    else
        % The token may have space, tab, any digit, decimal point, the 'e'
        % character, plus, or minus since all can be used to write numbers
        % in exponential notation
        strTokens = ...
            regexp( str, '<DataArray.*?>([ \t\d\.e\+\-]+)', 'tokens' );
        if ~isempty( strTokens )
            val = sscanf( strTokens{1}{1}, '%f', [1 n] );
        end
    end
end
fclose( file );

You may wish to add some error checking to ensure that the code found the value of n, before trying to use it to extract the data points.

I hope that this helps with the file parsing.

-Cam

Related Solutions

MATLAB: How to read xml file with binary data into Matlab? (VTK/VTU File)

The answer to determining the position was to calculate the character bit length from the first line read in like so:

    % open the file
    fid = fopen(filename, 'r');
    % close file when we're done
    CC = onCleanup (@() fclose(fid));
    xmlstrs = {fgetl(fid)};
    firstlinebytes = ftell (fid) - 1;
    bytesperchar = round (firstlinebytes / numel (xmlstrs{1}));

then the position of the first byte in the data section is

    datapos = ftell (fid) + bytesperchar;

Note, that this isn't the whole answer to reading 'raw' type data in the AppendedData section which is poorly documented. You will find more info on the format of 'raw' (rather than 'base64') data here, but the short answer is it's encoded like the following:

    _NNNN<data>NNNN<data>NNNN<data>
     ^         ^         ^
     1         2         3
    where each "NNNN" is an unsigned 32-bit integer, and <data> consists of
    a number of bytes equal to the preceding NNNN value.  The corresponding
    DataArray elements must have format="appended" and offset attributes
    equal to the following:
    1.) offset="0"
    2.) offset="(4+NNNN1)"
    3.) offset="(4+NNNN1+4+NNNN2)"

MATLAB: Concatenate data using fgets

You are closing the file in the loop. close the file after the loop. I assume the below should work.

function read_datapattern(filename)
fid = fopen(filename,'rt');
if fid < 0
  error('error opening file %s\n\n',filename);
end
pattern = fgets(fid);
for n = 1:32767
  nextline = fgets(fid);
  pattern = strcat(pattern, nextline);
fprintf(pattern);
fprintf('\n');
end
fclose(fid);

Best Answer

Related Solutions

MATLAB: How to read xml file with binary data into Matlab? (VTK/VTU File)

MATLAB: Concatenate data using fgets

Related Question