MATLAB: Regexprep $ and look-behind — bug or expected

lookaroundMATLABregexp

This question deals with regexprep and look-around operators.
Suppose you have
YourCell = {'2016-11-22 00:00:00.8'; '2016-11-22 00:00:00.9'; '2016-11-22 00:00:01'};
and you want to automatically add something like '.0' to the case that does not end in period followed by a digit .
NewCell = regexprep(YourCell, '(:\d\d)$', '$1.0', 'lineanchors')
which takes the approach of matching colon followed by two digits as a group, followed by end of line, and for that group substitutes the group followed by .0 . In the regexprep replacement the $1 means "first grouped object". So we know that the task can be done.
But when I was investigating, I took a different tactic, involving look-around operators. I decided I would look for end-of-line that was not proceeded by (period followed a digit), and for that end of line I would substitute '.0' .
The look-behind-for-match operator in regexp / regexprep is (?<=EXPRESSION) and the look-behind-for-non-match operator is (?<!EXPRESSION) . These are documented at https://www.mathworks.com/help/matlab/ref/regexp.html#input_argument_expression in the "Lookaround Assertions" section. Accordingly, it seems to me that I should be able to use either
regexprep(YourCell, '(?<!\.\d)$', '.0', 'lineanchors')
or
regexprep(YourCell, '$(?<!\.\d)', '.0', 'lineanchors')
However, no replacement is made.
Is the look-behind incorrect? Well we can test by chaning the $ to :
regexprep(YourCell, '(?<!\.\d):', '.0', 'lineanchors')
ans =
3×1 cell array
'2016-11-22 00.000.000.8'
'2016-11-22 00.000.000.9'
'2016-11-22 00.000.001'
and observing that we do get replacement of colons (that do not happen to be proceeded by period and a digit) with the target string. We can check whether the look-around is being ignored with
regexprep(YourCell, '(?<!:\d\d):', '.0', 'lineanchors')
ans =
3×1 cell array
'2016-11-22 00.000:00.8'
'2016-11-22 00.000:00.9'
'2016-11-22 00.000:01'
and seeing that the pattern is in fact actively used, that the colon is only matched when not preceded with colon-digit-digit . So the look-around is working.
Is the end-of-line anchor the problem?
regexprep(YourCell, '(\d)$', '$1.0', 'lineanchors')
ans =
3×1 cell array
'2016-11-22 00:00:00.8.0'
'2016-11-22 00:00:00.9.0'
'2016-11-22 00:00:01.0'
No, the only matched digit was the one at the end of the line, so the line anchor is matching properly.
The difficulty only occurs when you have a look-around in conjunction with a line anchor. The problem happens for the ^ anchor as well, as can be explored with
regexprep(YourCell, '(^)(?=\d)', '$1.0', 'lineanchors') %nothing happens!
regexprep(YourCell, '(-)(?=2)', '$1.0', 'lineanchors') %works

regexprep(YourCell, '^(2)', '$1.0', 'lineanchors') %works
The question is then whether it is expected that look-arounds do not work in conjunction with line-anchors, or if this is a MATLAB bug ?
Though I do see the line anchor working if at least one real character is matched:
regexprep(YourCell, '^(?=2).*', 'BLOB','lineanchors') %works, substitutes
regexprep(YourCell, '^(?=3).*', 'BLOB','lineanchors') %no substitutions, which is correct
You can see that my lookbehind works by testing with
regexp(YourCell, '.(?<!\.\d)$', 'match','lineanchors')
ans =
3×1 cell array
{}
{}
{1×1 cell}
>> ans{3}
ans =
cell
'1'
So it looks like a successful match of a zero-width expression is not triggering a replacement when I think it should.

Best Answer

Expected, it has something to do with "$" not matching "one or more" characters in the string. This works
regexprep( YourCell, '(?<!\.\d)$', '.0', 'emptymatch' )
ans =
'2016-11-22 00:00:00.8'
'2016-11-22 00:00:00.9'
'2016-11-22 00:00:01.0'