MATLAB: How to solve issue with strncmp returning incorrect logical values for text comparison…

I need to scan through a textfile line by line and pull out numerical variables corresponding to given line beginnings (e.g. save subjectnumber = 2 for the line 'Subject number: 2').

I am currently attempting to do this by loading the file with fopen, then using fget1 to work through the file 1 line at a time, and comparing the relevant amount of characters at the beginning of each line with saved strings which act as 'keys', using the strncmp function: If the first 'n' characters of the line match the key, the script would then save the numerical value as a variable in the workspace, to later incorporate into the final data structure.

However the strncmp function does not seem to be working correctly, and I cannot figure out why. Regardless of whether I compare Char array to Char array, or convert to Strings before comparison, the function returns a logical '0' even when the key matches the line. I can copy and paste the retrieved line from the document to the command window, use this to test the strncmp function against the key variable saved in the workspace, and get a logical '1' true result. However in the script itself, the function always returns logical '0'.

Has anybody encountered a similar issue before?

fileID = fopen('textfile.txt');
subkey = " S u b j e c t :";
while ischar(tline)
     tline = string(fgetl(fileID))  % get next line & convert to string
     submatch = strncmp(tline, subkey, length(subkey))  %check match for subject key PROBLEM LINE
     if submatch == 1
          % code to save numerical variable
     end
end

The printed output for fgetl for the line containing the desired information, and the subsequent strncmp check is:

tline =

    " S u b j e c t :   1 "    % i.e. identical to the specified key 'subkey' over the first 16 character

submatch =

logical
   0

>> +tline ans = Columns 1 through 21 729 355 42 0 42 0 42 0 32 0 72 0 101 0 97 0 100 0 101 0 114 Columns 22 through 42 0 32 0 83 0 116 0 97 0 114 0 116 0 32 0 42 0 42 0 42 0

Best Answer

Explanation: The problem is caused by the file encoding, which is little-endian UCS-2, a two-byte character encoding. So what you see as two separate characters (a letter followed by a space) is actually one single two-byte character inside the file. Combining string into the mix just confuses things even more, but does not change this fundamental issue with reading the file.

The reason that your string with space characters does not match is because what you see as space characters (i.e. ASCII 32) and used in subkey are not really spaces at all in the imported data: they are interpreted as NULL characters (ASCII 0) (of course they are not really characters at all, just the trailing byte of a two-byte character). For example the first line of the file apparently contains this (note all the NULL "characters"):

Also note the first few bytes are a quite large: these contain information which tells us about the byte order, and implies something about the file encoding.

Best Answer

Related Solutions

MATLAB: When I use fwrite to save a string with various ASCII characters, the resulting texfile changes some characters

MATLAB: How to read an UCS-2 encoded file in MATLAB

Related Question