Ticket UUID: | 906616 | |||
Title: | lsearch -regexp error introduced in 8.4.2 | |||
Type: | Bug | Version: | obsolete: 8.4.5 | |
Submitter: | rickmacd | Created on: | 2004-02-28 21:34:47 | |
Subsystem: | 43. Regexp | Assigned To: | dkf | |
Priority: | 5 Medium | Severity: | ||
Status: | Closed | Last Modified: | 2004-03-03 05:55:49 | |
Resolution: | Rejected | Closed By: | dkf | |
Closed on: | 2004-03-01 21:46:19 | |||
Description: |
When a list containing an embedded list element starting with a number is used as a regexp in lsearch, a parsing error occurs: couldn't compile regular expression pattern: invalid repetition count(s) If the first element is a text string (eg {zxcv 1234}) the error does not occur. I hit this problem when moving from 8.3.3 to 8.4.5. I tried every 8.4 release (downloaded directly from SF and built fresh) and determined it first broke in 8.4.2. I hope this helps. /usr/local/src/tcltk/tcl8.4.1/unix$ gcc --version gcc (GCC) 3.3.3 20040125 (prerelease) (Debian) The details below apply to tests run on Solaris 5.8 and Linux (Debian sid). Platform/OS does not seem to be an issue. It is the {1234 zxcv} element in the following list that triggers the problem: % list "" 2134 qwer "1234 zxcv" 2345 asdf {} 2134 qwer {1234 zxcv} 2345 asdf In the examples below I'm searching an empty list but this is just to simplify the examples and isn't an issue. /usr/local/src/tcltk/tcl8.4.2/unix$ LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH ./tclsh % info patchlevel 8.4.2 % lsearch -regexp {} ^[list "" 2134 qwer "1234 zxcv" 2345 asdf] couldn't compile regular expression pattern: invalid repetition count(s) % lsearch -regexp {} ^[list "" 2134 qwer "zxcv 1234" 2345 asdf] -1 % lsearch -regexp {} ^[list "" 2134 qwer "asdf zxcv" 2345 asdf] -1 % exit /usr/local/src/tcltk/tcl8.4.1/unix$ LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH ./tclsh % info patchlevel 8.4.1 % lsearch -regexp {} ^[list "" 2134 qwer "1234 zxcv" 2345 asdf] -1 % lsearch -regexp {} ^[list "" 2134 qwer "zxcv 1234" 2345 asdf] -1 % lsearch -regexp {} ^[list "" 2134 qwer "asdf zxcv" 2345 asdf] -1 % exit | |||
User Comments: |
rickmacd added on 2004-03-03 05:55:49:
Logged In: YES user_id=493198 Sorry, I totally missed that regexp had undergone such expansion. All I need to do is force the Basic RE behaviour by adding (?b): tclsh % info patchlevel 8.4.2 % set mylist [list [list [list 3 5] 1 3] [list [list 4 6] 4 6]] {{3 5} 1 3} {{4 6} 4 6} % set myelement [list [list 4 6]] {4 6} % lsearch -regexp $mylist ^$myelement couldn't compile regular expression pattern: quantifier operand invalid % lsearch -regexp $mylist (?b)^$myelement 1 % dgp added on 2004-03-03 04:50:00: Logged In: YES user_id=80530 that makes more sense; thanks for following up. From Tcl 8.0 -> 8.2, Tcl was extended to support Unicode. This included a new [regexp] engine capable of scanning Unicode strings, and also extended to recognize so-called Advanced Regular Expressions. (ARE) Looks like your examples include regexps that mean something different (and invalid) when parsed as ARE's, than they did in 8.0 when only Basic RE's were known. See http://tmml.sourceforge.net/doc/tcl/re_syntax.html or the corresponding part of your local Tcl documentation for details on the new regexp's availble in Tcl 8.1 and later. rickmacd added on 2004-03-03 03:11:29: Logged In: YES user_id=493198 I hadn't realized that searching an empty list in my example made a difference. I looked again, and with a closer example to what my app is doing I see that it worked in tcl8.0 but fails as of tcl8.2. I don't have 8.1 handy. I've restated an example here to make it more clear that what I am doing is searching a list of lists to match with a list element. I'm using -regexp to anchor the match at the beginning. In my mind, "^myelement" is a simple rexexp of the anchor "^" folowed by some data that is "unfortunately" being contrued as part of the regexp itself. I can certainly accept it if your answer is that tcl is behaving as desired and expected, but if you don't mind please confirm this in light of the revised example here. I do understand that after parsing, all my examples are probably seen the same by tcl. I wouldn't have thought this a bug if it had originally failed years ago when the code was first written. If there is no bug here I have some refactoring to do. In some cases I can use -exact, otheres -glob, but in some I'll have to loop through and compare each element manually. Or, can anybody see any clever quoting that could be done to the lsearch below so that $myelement is not seen as part of the regexp? tclsh8.0 % set mylist [list [list [list 3 5] 1 3] [list [list 4 6] 4 6]] {{3 5} 1 3} {{4 6} 4 6} % set myelement [list [list 4 6]] {4 6} % lsearch -regexp $mylist ^$myelement 1 % tclsh8.4 % set mylist [list [list [list 3 5] 1 3] [list [list 4 6] 4 6]] {{3 5} 1 3} {{4 6} 4 6} % set myelement [list [list 4 6]] {4 6} % lsearch -regexp $mylist ^$myelement couldn't compile regular expression pattern: quantifier operand invalid "-glob" works for this particular case: % lsearch -glob $mylist $myelement* 1 dkf added on 2004-03-02 04:46:19: Logged In: YES user_id=79902 Well, *I* don't think we need to keep that (lack of) failure mode. Especially as if the code was ever so unfortunate as to search against a non-empty list, it'd fail for sure. A failure that is masked under some non-obvious circumstances is trouble waiting to happen IMHO. Closing this, though if anyone has a good argument why that's wrong and this bug should be fixed instead, I'd love to hear it. dgp added on 2004-03-02 03:10:56: Logged In: YES user_id=80530 It looks like when [lsearch] was given an empty list, it did not bother to compile the regular expression, because there was no comparing to be done, so invalid regexp's were not reported. It's that slight change in behavior that's causing you trouble? Probably you should correct your code so it does not construct an invalid regexp for passing in. Passing to dkf for another opinion on whether this error behavior is something we needed to preserve. dgp added on 2004-03-02 03:06:28: Logged In: YES user_id=80530 hmmm... no. % lsearch -regexp {} {^{} 2134 qwer {1234 zxcv} 2345 asdf} couldn't compile regular expression pattern: invalid repetition count(s) % info patch 8.4.6 % lsearch -regexp {} {^{} 2134 qwer {1234 zxcv} 2345 asdf} -1 % info patch 8.3.5 % regexp {^{} 2134 qwer {1234 zxcv} 2345 asdf} {} couldn't compile regular expression pattern: invalid repetition count(s) dgp added on 2004-03-02 02:59:00: Logged In: YES user_id=80530 so [lsearch] has nothing to do with this right. It's just a question of what strings form a legal regexp ? % regexp "^{} 2134 qwer {1234 zxcv} 2345 asdf" {} couldn't compile regular expression pattern: invalid repetition count(s) |
