There is no real need for the intermediate regexp, you can get it all with just one regular expression:
tokens = regexp(TXTmod, '(R\d\d\w)/\w*(\d\d\d\d)\D\>', 'tokens');
Or more efficient (but a bit longer):
tokens = regexp(TXTmod, '\<(R\d{2}[A-Z])/(?:(?:\d{4})?[A-Z]+)?(\d{4})[A-Z]\>', 'tokens')
Note the inefficiency in your original expression: The \w*\d\d\d in your first regular expression is going to cause a lot of backtracking by the regular expression engine because the \w* is always going to match the next three \d. Because * is greedy, at first the engine is going to match the three digits with \w* and find then that it can't match 3 digits after. So it's going to backtrack one digit, match the first two digits with \w*, the 3rd digit with \d and find that it still can't find a match for the next two \d. it will have to backtrack two more times until \w* only match the letters and the three \d match a digit.
The new regular expression matches a optional group of 4 digits followed by 1 or more letter and then capture the final groups of 4 digits before the last letter. I've also added a start of word match: \<.
Other note: To rearrange the tokens of each string into a two column cell array:
cellfun(@(t) vertcat(t{:}), tokens, 'UniformOutput', false)
Best Answer