Tcl Source Code

View Ticket
Login
Ticket UUID: 3d40c2da8175daba83ef373c8b8d9e147744fe4c
Title: regexp (?:a*b)+c indices wrong
Type: Bug Version:
Submitter: pooryorick Created on: 2019-11-23 21:08:29
Subsystem: 43. Regexp Assigned To: nobody
Priority: 5 Medium Severity: Important
Status: Closed Last Modified: 2019-11-24 15:25:35
Resolution: Invalid Closed By: sebres
    Closed on: 2019-11-24 15:25:35
Description: (text/x-fossil-wiki)
In the following example, the resulting indices are wrong:

<code><verbatim>
% regexp -indices -inline {(?:a*b)+c} aaaabbbbcc
{0 8}
</verbatim></code>


The correct result should be:

<code><verbatim>
% regexp -indices -inline {(?:a*b)+c} aaaabbbbcc
{7 8}
</verbatim></code>
User Comments: sebres added on 2019-11-24 15:25:35: (text/x-fossil-wiki)
No.

Although the sub-expression traverses over all <code>a</code> (and finally over all <code>b</code> without <code>a</code>), despite this not capturing it as a group - the whole expression matches all characters to first <code>c</code> inclusive it.

So the result <code>{0 8}</code> is pretty correct - <code>{0..4}</code> <code>aaaab</code> within first iteration by <code>a*b</code>, <code>{5..7}</code> <code>bbb</code> within second to 4th iteration by <code>(a*b)+</code> and then finally <code>{8}</code> <code>c</code>.

Note that <code>(?:a*b)+</code> is neither a lookahead nor a lookbehind assertion (which is not implemented in tcl at all), so the whole match is captured for whole expression.

How the iterations for first subexpression look, illustrating this example:
<pre>
    % regexp -all -indices -inline {a*b} aaaabbbbcc
    {0 4} {5 5} {6 6} {7 7}
</pre>