MATLAB: Check that *.txt file is really a TXT formatted file

file format check

Hello!
How could I detect if the content of a *.txt file is really txt formatted, before further proceeding that file with my data import parser? I searched in folders for all files with file extension TXT in order to work with the data stored in each of them. In principal no problem so far. But it sometimes happened that a file has wrongly been stored as a *.TXT named file while its content is not in TXT format, but instead in some binary format (i.e. should better have been namened *.XLS).

Best Answer

It all depends on what you call a text file.
If it's an ASCII file, then the code value of the characters is limited to 0-127, so you could test if any character has a value > 127. The presence of code values in the range 0-31 with the exception of 9 (tab), 10 and 13 (new lines) would also be a strong indication that the content is not meant to be read as text. It's not a guarantee though.
If it's an extended ASCII file, then the whole range 0-255 is used. Other than semantics, there's nothing distinguishing a text file from a binary file. Again characters in the range [0-8, 11-12, 14-31] would be an indication.
If it's an UTF8 file, there are some combinations that are not allowed and you could try to detect them. Again [0-31] is an indication that it's not meant to be text.
Perhaps, instead of trying to discriminate text files against binary, what you should be discriminating is files conforming to the format your code expects and those that don't?