MATLAB: Remove “( )” from string words

regexprepremove

Best Answer

regexp is very compact and powerful - however it can be a bit slow.

If speed is an issue a simple for loop and strrep can sometimes be faster:

 db = { 'NEW' 'May' '(AFP)' 'US' 'prosecutors' 'on' };
 pattern = '\(|)';
 tic
 newdb = regexprep(db, pattern, '');
 toc
 tic
 loopDB = db;
 for ii=1:length(db)
   loopDB{ii} = strrep ( db{ii}, '(', '' );
   loopDB{ii} = strrep ( loopDB{ii}, ')', '' );
 end
 toc
 isequal ( newdb, loopDB )
 Elapsed time is 0.002665 seconds.
 Elapsed time is 0.000087 seconds.
 ans =
     1

Related Solutions

MATLAB: How to sort the data from regexp

There is no real need for the intermediate regexp, you can get it all with just one regular expression:

tokens = regexp(TXTmod, '(R\d\d\w)/\w*(\d\d\d\d)\D\>', 'tokens'); %You were missing a \d in your regexp (which was captured by the \w* so it didn't matter)

Or more efficient (but a bit longer):

tokens = regexp(TXTmod, '\<(R\d{2}[A-Z])/(?:(?:\d{4})?[A-Z]+)?(\d{4})[A-Z]\>', 'tokens')

Note the inefficiency in your original expression: The \w*\d\d\d in your first regular expression is going to cause a lot of backtracking by the regular expression engine because the \w* is always going to match the next three \d. Because * is greedy, at first the engine is going to match the three digits with \w* and find then that it can't match 3 digits after. So it's going to backtrack one digit, match the first two digits with \w*, the 3rd digit with \d and find that it still can't find a match for the next two \d. it will have to backtrack two more times until \w* only match the letters and the three \d match a digit.

The new regular expression matches a optional group of 4 digits followed by 1 or more letter and then capture the final groups of 4 digits before the last letter. I've also added a start of word match: \<.

Other note: To rearrange the tokens of each string into a two column cell array:

cellfun(@(t) vertcat(t{:}), tokens, 'UniformOutput', false)

MATLAB: Is there a more efficient way to omit certain characters from a string without using loops

One way would be

 newsf = regexprep(sf, '([CREP1-4]|SQ)', '') ;

but what does "etc" encompass?

Best Answer

Related Solutions

MATLAB: How to sort the data from regexp

MATLAB: Is there a more efficient way to omit certain characters from a string without using loops

Related Question