Tcl Source Code

View Ticket
Login
Ticket UUID: 3d40c2da8175daba83ef373c8b8d9e147744fe4c
Title: regexp (?:a*b)+c indices wrong
Type: Bug Version:
Submitter: pooryorick Created on: 2019-11-23 21:08:29
Subsystem: 43. Regexp Assigned To: nobody
Priority: 5 Medium Severity: Important
Status: Closed Last Modified: 2019-11-24 15:25:35
Resolution: Invalid Closed By: sebres
    Closed on: 2019-11-24 15:25:35
Description:

In the following example, the resulting indices are wrong:

% regexp -indices -inline {(?:a*b)+c} aaaabbbbcc
{0 8}

The correct result should be:

% regexp -indices -inline {(?:a*b)+c} aaaabbbbcc
{7 8}

User Comments: sebres added on 2019-11-24 15:25:35:

No.

Although the sub-expression traverses over all a (and finally over all b without a), despite this not capturing it as a group - the whole expression matches all characters to first c inclusive it.

So the result {0 8} is pretty correct - {0..4} aaaab within first iteration by a*b, {5..7} bbb within second to 4th iteration by (a*b)+ and then finally {8} c.

Note that (?:a*b)+ is neither a lookahead nor a lookbehind assertion (which is not implemented in tcl at all), so the whole match is captured for whole expression.

How the iterations for first subexpression look, illustrating this example:

    % regexp -all -indices -inline {a*b} aaaabbbbcc
    {0 4} {5 5} {6 6} {7 7}