MATLAB: How to use textscan to split a string containing numbers, NaN and strings with quotes (or not)

textscan string nan quote

Edit: the final purpose is to use textscan on a large file (~1gb), so processing the string before applying texscan is not possible.
This is the string I want to split with "textscan":
s = '-0.27,"NAN","NAN",0.6,"22/09/17 22:59"';
I have tried different syntax:
– test 1
textscan(s, '%f%f%f%f%s', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n', 'Whitespace', ' "')
Result: [-0.2700] [NaN] [NaN] [0.6000] {'22/09/17 22:59"'}
the best result, only problem: the left over quote at the end of the string. I don't understand why, are the chars listed in "Whitespace" not supposed to be removed?
– test 2
textscan(s, '%f%f%f%f%q', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n', 'Whitespace', ' "')
Result: [-0.2700] [NaN] [NaN] [0.6000] {'22/09/17 22:59"'}
same as above
– test 3
textscan(s, '%f%f%f%f%q', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n')
Result: [-0.2700] [0x1 double] [0x1 double] [0x1 double] {0x1 cell}
fail to read NANs
– test 4
textscan(s, '%f%f%f%f"%s"', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n', 'Whitespace', ' "')
Result: [-0.2700] [NaN] [NaN] [0.6000] {0x1 cell}
fail to read the string
– test 5
textscan(s, '%f%f%f%f"%s"', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n')
Result: [-0.2700] [0x1 double] [0x1 double] [0x1 double] {0x1 cell}
fail to read NANs
– test 6
textscan(s, '%f"%f""%f"%f"%s"', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n')
Result: [-0.2700] [NaN] [0x1 double] [0x1 double] {0x1 cell}
fail to read the 2nd NAN
– test 7
textscan(s, '%f"%f""%f"%f%q', 'delimiter', ',', 'CollectOutput', false, 'MultipleDelimsAsOne', 0, 'HeaderLines', 0, 'endOfLine', '\r\n', 'Whitespace', ' "')
Result: [-0.2700] [0x1 double] [0x1 double] [0x1 double] {0x1 cell}
fail to read NANs
Any suggestion? Thanks

Best Answer

textscan(s, '%f%f%f%f%q', 'delimiter', ',', 'treat','"NAN"')