Tcl Source Code

View Ticket
Login
2015-09-21
19:07 Closed ticket [0e0e150e49]: Fix for quantified regexp back-references plus 7 other changes artifact: 4a2d0afe79 user: dgp
19:04
[1115587][0e0e150e49] Major fix for regexp handling of quantified backrefs. Contributed by Tom Lane ... check-in: c8dfe06653 user: dgp tags: trunk
2015-09-19
15:55 Add attachment quantified-backrefs.patch to ticket [0e0e150e49] artifact: 5d27502ea2 user: tgl
15:55 New ticket [0e0e150e49] Fix for quantified regexp back-references. artifact: 3e9cf926a1 user: tgl

Ticket UUID: 0e0e150e49479e3f3f7b20efa1817813216fe2ad
Title: Fix for quantified regexp back-references
Type: Patch Version: 8.6.4
Submitter: tgl Created on: 2015-09-19 15:55:29
Subsystem: 43. Regexp Assigned To: dgp
Priority: 5 Medium Severity: Important
Status: Closed Last Modified: 2015-09-21 19:07:42
Resolution: Accepted Closed By: dgp
    Closed on: 2015-09-21 19:07:42
Description:
The attached patch solves the problems with quantified back-references
that were previously discussed in
http://core.tcl.tk/tcl/tktview?name=1115587

This patch represents a port of work that's been done on Postgres'
copy of the regexp library over the past several years, specifically
these commits:
5223f96d92fd6fb6fcf260da9f9cb111831f0b37
173e29aa5deefd9e71c183583ba37805c8102a72
3cbfe485e44d055b9e6a27e47069729375059f8c
4dd78bf37aa29d04b3f358b08c4a2fa43cf828e7
2a4c46e0baf2d51117cd4468b28705d01ffcbff9
3694b4d7e1aa02f917f9d18c550fbb49b96efa83
which you can look at in the Postgres repo at
http://git.postgresql.org/gitweb/?p=postgresql.git
if you want a sense of the development history.  This submission would
probably be easier to follow if I'd submitted individual patches
equivalent to each of those steps ... but transposing code between
Postgres and Tcl layout conventions is enough of a pain in the rear that
I couldn't muster the energy to do it repeatedly.

In fact, this still doesn't follow Tcl layout conventions very well,
partly because I'm not totally certain what they are.  I'm hoping you
have a suitable reformatting tool.

Anyway, the core of the fix is to introduce an explicit "iteration"
subre type, rather than relying completely on a compile-time
transformation, as I'd speculated about in my comments in 1115587.
Subsequent cleanup includes getting rid of the useless "retry memory"
stuff and folding the parallel dissect() and cdissect() code paths
into a single implementation, which is why the regexec.c changes are
so bulky-looking.

We've been using this successfully in Postgres for several years,
with only a couple of minor bugs discovered (see the last two commits
mentioned above).  So I now feel confident enough in it to recommend
that you adopt it.

This supersedes my previous submission at ticket 3487443, which I've
now closed.
User Comments: dgp added on 2015-09-21 19:07:42:
Fix accepted for release in Tcl 8.6.5.

Unfortunately 8.5 and 8.6 have diverged too much to 
adapt this patch to apply to 8.5.19 with the effort I'm
willing to spare.

If fixing this in continuing 8.5.* releases is important,
the best way may be simply to copy all the 8.6 source
files r*.c as is back to the 8.5 branch.  I don't know of
any differences that have value to be preserved.

Attachments: