MATLAB: Is regexp including extra data

MATLABregexpsplit

I’m trying to use REGEXP to match the following flags (State, Post, RC, State, Junk) , and then create a cell array of strings.
My inputs are:
1 25.187466 156.162447 21578.188 97.134234 State AAAAA 1 C00B
2 25.287466 156.162447 21578.288 97.234234 Post BBBBB 2 C11B
9 25.387466 156.362447 21578.388 97.334234 RC CCCCC 3 C22B
99 25.387466 156.362447 21578.388 97.334234 State DDDDD 4 C33B
999 25.387466 156.362447 21578.388 97.334234 Junk EEEEE 5 C44B
I’m using the following MATLAB commands:
data = regexp(LineTxt,'-?\d+(\.\d+)?','split');
Flag=cellstr(data{1,6});
For unknown reasons I keep getting the following output:
' State AAAAA' ' Post BBBBB' ' RC CCCCC' ' State DDDDD' ' Junk EEEEE'
Intended output is:
' State ' ' Post ' ' RC ' ' State ' ' Junk'
Why are the extra fields being included?

Best Answer

Neither do I understand why. Regular expressions are tricky. Another approach:
loop over all rows
data = regexp( str, '((State)|(Post)|(RC)|(State)|(Junk))', 'match','once' );
end
[Edit] IMO Given that the file is written with a similar format string, textscan is the best way to read the file.
fid = fopen( 'cssm.txt' );
cac = textscan( fid, '%*u%*f%*f%*f%*f%s%*s%*d%*s' );
fclose( fid );
cac{:}
returns
ans =
'State'
'Post'
'RC'
'State'
'Junk'
.
Answer to comment:
str = fileread( 'cssm.txt' );
data = regexp( str, '((State)|(Post)|(RC)|(State)|(Junk))', 'match' )
returns
data =
'State' 'Post' 'RC' 'State' 'Junk'
where cssm.txt contains your five lines of data
However, the smarter the solution the harder it is to make a robust (and flexible) code. How should the code behave if there are rows in the file, which do not adhere to the "format" that I interfere from your example? And in a few days, you might find that you need the third column.