Tcl Source Code

View Ticket
Login
Ticket UUID: 906616
Title: lsearch -regexp error introduced in 8.4.2
Type: Bug Version: obsolete: 8.4.5
Submitter: rickmacd Created on: 2004-02-28 21:34:47
Subsystem: 43. Regexp Assigned To: dkf
Priority: 5 Medium Severity:
Status: Closed Last Modified: 2004-03-03 05:55:49
Resolution: Rejected Closed By: dkf
    Closed on: 2004-03-01 21:46:19
Description:
When a list containing an embedded list element
starting with a number is used as a regexp in lsearch,
a parsing error occurs:

couldn't compile regular expression pattern: invalid
repetition count(s)

If the first element is a text string (eg {zxcv 1234})
the error does not occur. 

I hit this problem when moving from 8.3.3 to 8.4.5. I
tried every 8.4 release (downloaded directly from SF
and built fresh) and determined it first broke in
8.4.2. I hope this helps.

/usr/local/src/tcltk/tcl8.4.1/unix$ gcc --version
gcc (GCC) 3.3.3 20040125 (prerelease) (Debian)

The details below apply to tests run on Solaris 5.8 and
Linux (Debian sid). Platform/OS does not seem to be an
issue.

It is the {1234 zxcv} element in the following list
that triggers the problem:

% list "" 2134 qwer "1234 zxcv" 2345 asdf
{} 2134 qwer {1234 zxcv} 2345 asdf

In the examples below I'm searching an empty list but
this is just to simplify the examples and isn't an issue.

/usr/local/src/tcltk/tcl8.4.2/unix$
LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH ./tclsh
% info patchlevel
8.4.2
% lsearch -regexp {} ^[list "" 2134 qwer "1234 zxcv"
2345 asdf]
couldn't compile regular expression pattern: invalid
repetition count(s)
% lsearch -regexp {} ^[list "" 2134 qwer "zxcv 1234"
2345 asdf]
-1
% lsearch -regexp {} ^[list "" 2134 qwer "asdf zxcv"
2345 asdf]
-1
% exit


/usr/local/src/tcltk/tcl8.4.1/unix$
LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH ./tclsh 
% info patchlevel
8.4.1
% lsearch -regexp {} ^[list "" 2134 qwer "1234 zxcv"
2345 asdf]
-1
% lsearch -regexp {} ^[list "" 2134 qwer "zxcv 1234"
2345 asdf]
-1
% lsearch -regexp {} ^[list "" 2134 qwer "asdf zxcv"
2345 asdf]
-1
% exit
User Comments: rickmacd added on 2004-03-03 05:55:49:
Logged In: YES 
user_id=493198


Sorry, I totally missed that regexp had undergone such
expansion. All I need to do is force the Basic RE behaviour
by adding (?b):

tclsh
% info patchlevel  
8.4.2
% set mylist [list [list [list 3 5] 1 3] [list [list 4 6] 4 6]]
{{3 5} 1 3} {{4 6} 4 6}
% set myelement [list [list 4 6]]
{4 6}
% lsearch -regexp $mylist ^$myelement
couldn't compile regular expression pattern: quantifier
operand invalid
% lsearch -regexp $mylist (?b)^$myelement
1
%

dgp added on 2004-03-03 04:50:00:
Logged In: YES 
user_id=80530


that makes more sense; thanks
for following up.

From Tcl 8.0 -> 8.2, Tcl was extended
to support Unicode.  This included a
new [regexp] engine capable of scanning
Unicode strings, and also extended to
recognize so-called Advanced Regular
Expressions. (ARE)

Looks like your examples include regexps
that mean something different (and invalid)
when parsed as ARE's, than they did in
8.0 when only Basic RE's were known.

See 

http://tmml.sourceforge.net/doc/tcl/re_syntax.html

or the corresponding part of your local
Tcl documentation for details on the new
regexp's availble in Tcl 8.1 and later.

rickmacd added on 2004-03-03 03:11:29:
Logged In: YES 
user_id=493198

I hadn't realized that searching an empty list in my example
made a difference.

I looked again, and with a closer example to what my app is
doing I see that it worked in tcl8.0 but fails as of tcl8.2.
I don't have 8.1 handy.

I've restated an example here to make it more clear that
what I am doing is searching a list of lists to match with a
list element. I'm using -regexp to anchor the match at the
beginning. In my mind, "^myelement" is a simple rexexp of
the anchor "^" folowed by some data that is "unfortunately"
being contrued as part of the regexp itself.

I can certainly accept it if your answer is that tcl is
behaving as desired and expected, but if you don't mind
please confirm this in light of the revised example here. I
do understand that after parsing, all my examples are
probably seen the same by tcl. I wouldn't have thought this
a bug if it had originally failed years ago when the code
was first written.

If there is no bug here I have some refactoring to do. In
some cases I can use -exact, otheres -glob, but in some I'll
have to loop through and compare each element manually. Or,
can anybody see any clever quoting that could be done to the
lsearch below so that $myelement is not seen as part of the
regexp?

tclsh8.0
% set mylist [list [list [list 3 5] 1 3] [list [list 4 6] 4 6]]
{{3 5} 1 3} {{4 6} 4 6}
% set myelement [list [list 4 6]]
{4 6}
% lsearch -regexp $mylist ^$myelement
1
% 

tclsh8.4
% set mylist [list [list [list 3 5] 1 3] [list [list 4 6] 4 6]]
{{3 5} 1 3} {{4 6} 4 6}
% set myelement [list [list 4 6]]
{4 6}
% lsearch -regexp $mylist ^$myelement
couldn't compile regular expression pattern: quantifier
operand invalid

"-glob" works for this particular case:
% lsearch -glob $mylist $myelement*
1

dkf added on 2004-03-02 04:46:19:
Logged In: YES 
user_id=79902

Well, *I* don't think we need to keep that (lack of) failure
mode.  Especially as if the code was ever so unfortunate as
to search against a non-empty list, it'd fail for sure.  A
failure that is masked under some non-obvious circumstances
is trouble waiting to happen IMHO.

Closing this, though if anyone has a good argument why
that's wrong and this bug should be fixed instead, I'd love
to hear it.

dgp added on 2004-03-02 03:10:56:
Logged In: YES 
user_id=80530


It looks like when [lsearch] was given an
empty list, it did not bother to compile the
regular expression, because there was no
comparing to be done, so invalid regexp's
were not reported.  It's that slight change
in behavior that's causing you trouble?

Probably you should correct your code so
it does not construct an invalid regexp for
passing in.

Passing to dkf for another opinion on
whether this error behavior is something
we needed to preserve.

dgp added on 2004-03-02 03:06:28:
Logged In: YES 
user_id=80530

hmmm... no.

% lsearch -regexp {} {^{} 2134 qwer {1234 zxcv} 2345 asdf}
couldn't compile regular expression pattern: invalid
repetition count(s)
% info patch
8.4.6

% lsearch -regexp {} {^{} 2134 qwer {1234 zxcv} 2345 asdf}
-1
% info patch
8.3.5
% regexp {^{} 2134 qwer {1234 zxcv} 2345 asdf} {}
couldn't compile regular expression pattern: invalid
repetition count(s)

dgp added on 2004-03-02 02:59:00:
Logged In: YES 
user_id=80530


so [lsearch] has nothing to do with this
right.  It's just a question of what
strings form a legal regexp ?

% regexp "^{} 2134 qwer {1234 zxcv} 2345 asdf" {}
couldn't compile regular expression pattern: invalid
repetition count(s)