I have the following string, read into MATLAB:
*aaa$bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb111111111111111111111122222222222233333333333333333333333334444444444555556666666777777788899999*ddd$11111111111111111111111111111111222222222222222abcdf99999999999*abcde99999$eeeeeeeeeeeeeeeeeeeeee
I would like to perform a search that only extracts the text between *aaa and *ddd, using the following regexp pattern:
pattern = '(?<=\*aaa\s)(.*|\n)*?(?=\*)';
I expected the middle (.*|\n)*? to match the minimum number of "either any character other than linebreak, or a linebreak" that sits between *aaa and the closest * symbol, at *ddd. Instead, MATLAB returns the following:
$bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb111111111111111111111122222222222233333333333333333333333334444444444555556666666777777788899999$11111111111111111111111*ddd$11111111111111111111111111111111222222222222222abcdf99999999999
Instead of stopping at just before *ddd, regexp continued until just before *abcde99999, despite the presence of the "?" at the end of the middle section of the pattern.
Just to make sure this isn't a lookaround issue, I also tried running
pattern = '\*(.*|\n)*?\*';
And sure enough, I get the following, with the *ddd in the middle being skipped entirely:
*aaa$bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb111111111111111111111122222222222233333333333333333333333334444444444555556666666777777788899999$11111111111111111111111*ddd$11111111111111111111111111111111222222222222222abcdf99999999999*
Best Answer