Ticket UUID: | 1860727 | |||
Title: | PCRE optional regexp | |||
Type: | Patch | Version: | TIP Implementation | |
Submitter: | hobbs | Created on: | 2007-12-30 01:12:44 | |
Subsystem: | 43. Regexp | Assigned To: | aku | |
Priority: | 6 | Severity: | Minor | |
Status: | Open | Last Modified: | 2017-11-17 14:52:47 | |
Resolution: | None | Closed By: | nobody | |
Closed on: | ||||
Description: |
Attached is a diff that adds a configure --with-pcre option, as well as -type classic|pcre -binary options to [regexp] (available in either build, only functional with --with-pcre). --with-pcre=/path/to/pcre (or have it installed in a "default" location). Initial testing shows that PCRE is significantly faster in all cases the the classic Spencer engine. | |||
User Comments: |
sebres added on 2017-11-17 14:52:47:
I've rebased it to newer 8.5 (many conflicts resolved) relative my branch sebres-8-5-timerate (in order to test performance also). Currently available on my github (see artificial PR sebres/tcl#5). Compared to original variant provided von Jeffrey with patch pcre-20080121, it is complete:- Additionally:
Todo's:
As regards the performance, the PCRE as well as DFA are very faster as classic NFA of tcl (up to 10 times and on large regexp still faster).
Here an excerpt as a foretaste: % foreach t {c p d} {
proc test_$t {} \
[string map [list _REENG_ $t] \
{puts _REENG_:[timerate {regsub -type _REENG_ -all -line {^((\d{2})-(\d{2})-(\d{2,4})|NULL)$} "10-10-2017\nNULL\n20-10-2017" {**\1**\2**}}]}
]; puts "% [info body test_$t]"; test_$t }
% puts c:[timerate {regsub -type c -all -line {^((\d{2})-(\d{2})-(\d{2,4})|NULL)$} "10-10-2017\nNULL\n20-10-2017" {**\1**\2**}}]
c:38.3610 µs/# 26049 # 26068.1 #/sec 999.266 nett-ms
% puts p:[timerate {regsub -type p -all -line {^((\d{2})-(\d{2})-(\d{2,4})|NULL)$} "10-10-2017\nNULL\n20-10-2017" {**\1**\2**}}]
p:1.413072 µs/# 693365 # 707677 #/sec 979.775 nett-ms
% puts d:[timerate {regsub -type d -all -line {^((\d{2})-(\d{2})-(\d{2,4})|NULL)$} "10-10-2017\nNULL\n20-10-2017" {**\1**\2**}}]
d:1.205143 µs/# 810168 # 829777 #/sec 976.368 nett-ms
Tested with PCRE 8.40. If interested by TCT I'll rebase it to fossil as soon as possible and provide my 8.6th and 8.7th branches for this. I'll just spare this work (rebase) if nobody needs it. Ah, yes, don't forget: Thanks to Jeffrey for the original work! hobbs added on 2008-01-22 11:47:36: File Added - 263244: pcre-20080121.diff.gz Logged In: YES user_id=72656 Originator: YES Updated to have --enable-pcre=yes|no|default. If default is used, then PCRE will be the default engine. --with-pcre still exists to point to a non-standard location. Fixed a -indices issue, and updated the test suite. The remaining test issues mostly represent differences in line anchor styles. File Added: pcre-20080121.diff.gz hobbs added on 2008-01-03 05:16:04: File Deleted - 260247: File Added - 260524: pcre.diff4.gz Logged In: YES user_id=72656 Originator: YES Updated version that doesn't leak the study'd pcre info, corrects more tests and is generally better, so just use it. File Added: pcre.diff4.gz hobbs added on 2008-01-01 03:16:18: File Added - 260318: pcre.diff3.gz Logged In: YES user_id=72656 Originator: YES New patch that calms some tests, fixes [lsearch -regexp] crash condition. Note that any calls that use Tcl_GetRegexpFromObj with NULL interp can't check the [interp regexp {} pcre] state (as lsearch -regexp does). In this version, you can set environment TCL_REGEXP_PCRE to have PCRE enabled by default in Tcl interps. File Added: pcre.diff3.gz hobbs added on 2007-12-31 09:15:23: File Added - 260247: pcre.diff2.gz Logged In: YES user_id=72656 Originator: YES Updated version that has cleaner integration. The conversion of RE compile flags is done at the caching of the object. This version includes fully correct handling in [regsub] (you'll find the it is mostly transparent to Tcl_RegsubObjCmd), with support for the whole Tcl_GetRegExpFromObj/Tcl_RegExpExecObj/Tcl_RegExpGetInfo path of execution being handled 100% transparently for classic or PCRE REs. The translation of flags needs to be better reconciled between Spencer's flag meanings and PCREs (like TCL_REG_NLSTOP TCL_REG_NLMATCH == ??? in PCRE). File Added: pcre.diff2.gz hobbs added on 2007-12-30 09:04:26: File Deleted - 260122: hobbs added on 2007-12-30 09:04:25: File Added - 260151: pcre.diff.gz Logged In: YES user_id=72656 Originator: YES Updated version that adds: interp regexp {} ?classic|pcre? So set the default engine with [interp regexp {} pcre]. I've also added support in Tcl_RegExpExecObj to recognize compiled PCREs so that the compile case works. It currently assumes -binary operation by default. In the lmbench grep.tcl code, you need to add: if {![catch {interp regexp {}}]} { puts stderr "PCRE regexp" interp regexp {} pcre } else { puts stderr "TCL regexp" } and then it will work as before, just faster. File Added: pcre.diff.gz hobbs added on 2007-12-30 08:12:44: File Added - 260122: pcre.diff.gz |
