MATLAB: Reading complicated mixed text/numbers file

fscanf file strcmp

I would like to read a vtu file containing the solution of a problem in Matlab.
In particular, I'd like to get the size of the data I want to read, which is given at the beginning of my file by the variable "NumberOfPoints" in this piece of file
<VTKFile type="UnstructuredGrid" version="0.1" >
<UnstructuredGrid>
<Piece NumberOfPoints="5101" NumberOfCells="10000">
<Points>
Also, the data that I'd like to import in Matlab are preceded by
<DataArray type="Float64" Name="u" format="ascii">1.0000000000000000e+00 2.0000000000000000e+00
At the moment I can read them only if I put my data on a new line in the file, i.e.
<Piece NumberOfPoints>
5101
NumberOfCells="10000">
and
<DataArray type="Float64" Name="u" format="ascii">
1.0000000000000000e+00 2.0000000000000000e+00
using this code
file = fopen( fileName, 'rt' );
while (~feof( file ))
str = fgets( file );
str = strtrim(str);
switch (str)
case '<Piece NumberOfPoints>'
n = fscanf( file, '%d', 1 )
case '<DataArray type="Float64" Name="u" format="ascii">'
val = fscanf( file, '%f', [1, n] )';
end
end
fclose( file );
How can I get the values without modifying my files by hand? I have a lot of files with very big size and this procedure takes long time.
Thank you,
Elisa

Best Answer

Hi Elisa,
I understand that you are trying to read the data points from a VTU file with a specific format without having to modify the file by hand. I am assuming that all of the data points are contained on the line starting with the "DataArray" tag. There are many different ways of parsing the file, so I'll give you a couple of approaches.
The first approach is very similar to your code, but avoids using "switch" to check for the line of interest. Switch-case constructs will only work for exact matches, but you want to know if a particular string is only part of the file line. The "strfind" function, among others, will look for the specified substring within the given string. You could also use the "strncmp" function if you would prefer that.
Also, since the data you are interested in is on the same line as the substring that specifies it as the line of interest, you cannot use "fscanf" to parse that line. If you always know that "NumberOfPoints" will be the first attribute in the "Piece" tag, you can use the "strsplit" function to extract the number of data points you want. You can use similar methods to extract the data points from the "DataArray" line.
file = fopen( fileName, 'rt' );
while (~feof( file ))
str = fgets( file );
str = strtrim( str );
if strfind( str, '<Piece NumberOfPoints=' )
strPieces = strsplit( str, '"' ); % Split at the double-quote marks
n = str2double( strPieces{2} ); % Convert to number
elseif strfind( str, '<DataArray type=' )
strPieces = strsplit( str, '>' ); % Split at the end of the tag
val = sscanf( strPieces{end}, '%f', [1 n] ); % Read in data
end
end
fclose( file );
One of the issues with this approach, however, is that it is not very robust for files of slightly different formats. For example, if there were only a single space between "Piece" and "NumberOfPoints", it would be enough to ensure that this code will never find the value for "n". A much more robust approach would be to use regular expressions . These can be tricky to work with, but they allow for more flexibility in the file format.
file = fopen( fileName, 'rt' );
while (~feof( file ))
str = fgets( file );
% The token of interest must have one or more digits, and only digits
strTokens = regexp( str, 'NumberOfPoints="(\d+)"', 'tokens' );
if ~isempty( strTokens )
n = str2double( strTokens{1}{1} );
else
% The token may have space, tab, any digit, decimal point, the 'e'
% character, plus, or minus since all can be used to write numbers
% in exponential notation
strTokens = ...
regexp( str, '<DataArray.*?>([ \t\d\.e\+\-]+)', 'tokens' );
if ~isempty( strTokens )
val = sscanf( strTokens{1}{1}, '%f', [1 n] );
end
end
end
fclose( file );
You may wish to add some error checking to ensure that the code found the value of n, before trying to use it to extract the data points.
I hope that this helps with the file parsing.
-Cam