MATLAB: Matlab text file opts varying for similar files

data handlingdetectorsopts

Hi, I have 2 text files with the same amount of columns/headers, when a measurement is not completed it fills in the field with an "UND" – which can be "UND. -60001" or "UND. -62011". I have a script which usually has no problems but when it does it has been very difficult to pin down the cause, I have noticed by reading the opts that it is treating the two files differently, mfile and 2 data files attached.I don't see why the files should be treated any differently, any ideas?
The file that reads in ok has this in its 'opts'.
opts =
DelimitedTextImportOptions with properties:
Format Properties:
Delimiter: {'\t'}
Whitespace: '\b '
LineEnding: {'\n' '\r' '\r\n'}
CommentStyle: {}
ConsecutiveDelimitersRule: 'split'
LeadingDelimitersRule: 'keep'
EmptyLineRule: 'skip'
Encoding: 'ISO-8859-1'
Replacement Properties:
MissingRule: 'fill'
ImportErrorRule: 'fill'
ExtraColumnsRule: 'addvars'
Variable Import Properties: Set types by name using setvartype
VariableNames: {'Nozzle_number', 'Frequency_khz', 'Velocity_ms' ... and 4 more}
VariableTypes: {'char', 'double', 'double' ... and 4 more}
SelectedVariableNames: {'Nozzle_number', 'Frequency_khz', 'Velocity_ms' ... and 4 more}
VariableOptions: Show all 7 VariableOptions
Whereas the file which does not load properly has this in its opts
opts =
DelimitedTextImportOptions with properties:
Format Properties:
Delimiter: {'\t' ' '}
Whitespace: '\b'
LineEnding: {'\n' '\r' '\r\n'}
CommentStyle: {}
ConsecutiveDelimitersRule: 'join'
LeadingDelimitersRule: 'ignore'
EmptyLineRule: 'skip'
Encoding: 'ISO-8859-1'
Replacement Properties:
MissingRule: 'fill'
ImportErrorRule: 'fill'
ExtraColumnsRule: 'addvars'
Variable Import Properties: Set types by name using setvartype
VariableNames: {'Var1', 'Var2', 'Var3' ... and 6 more}
VariableTypes: {'char', 'double', 'char' ... and 6 more}
SelectedVariableNames: {'Var1', 'Var2', 'Var3' ... and 6 more}
VariableOptions: Show all 9 VariableOptions

Best Answer

The difference is that the second file has the UND indicator in the first data line whereas the first file has a completed record. It is that record that the options routine uses to try to parse the file and so for that file there are what appear to be nine variables in the data record but there are only six column names. That mismatch creates confusion.
In this case I would suggest to not call detectImportOptions(files(jj).name) but to use a specific hand-built options object for these files or dispense with it entirely and pass everything needed as named parameter pairs in the readtable call.
ADDENDUM
After looking at your files, I think I'd go at this somewhat differently; I'd just let readtable bring in the file as cell array, do the substitution on the bad data and convert. Is it likely there's ever a file that doesn't have at least one UND in the numeric data fields?
I don't know just what your other code after reading a file does, but I'd so that portion more nearly as:
d=dir('/Users/imagexpertinc/Desktop/odds/freq_sweeps/*.txt');
for i=1:length(d)
t=readtable(d(i).name,opts); % table as cellstr variables
v=cellfun(@str2num,regexprep(table2cell(t(:,3:end)),'UND.*','NaN')); % convert the UND to NaN on cell array of all variables, convert to doubles
for j=1:5 % put back into existing table
t.(j+2)=v(:,j);
end
...
% Now do what needs done with this table here before going on to next...
end
The opts table was created from an artificial RECORD.txt file that looks like a single record:
Nozzle_number Frequency_khz Velocity_ms Volume_pl Trajectory_deg X_coordinate_mm Y_coordinate_mm
- 4 UND. -60001 UND. -62011 UND. -60001 UND. -2011 UND. -2011
so the variables would all be recognized and imported as text; this makes the conversion performed the same on every column for every file whereas if there were a given file in which a specific variable was ok for every observation, by default that would be imported as numeric and logic would have to be written to handle it.
Unless, of course, the substituted missing value itself has significance for some reason; then would need to convert it, but your solution seems to not discern that difference, either.