Ticket UUID: | 703709 | |||
Title: | 'TCL_REG_CANMATCH' flag returns incorrect matches | |||
Type: | Bug | Version: | obsolete: 8.4.2 | |
Submitter: | vincentdarley | Created on: | 2003-03-14 16:23:11 | |
Subsystem: | 43. Regexp | Assigned To: | aku | |
Priority: | 8 | Severity: | Minor | |
Status: | Open | Last Modified: | 2015-11-06 21:04:46 | |
Resolution: | None | Closed By: | nobody | |
Closed on: | ||||
Description: |
The attached patch to tclTest.c and reg.test shows how Tcl_RegExpExecObj returns incorrect matches for many, many cases using the 'TCL_CAN_MATCH' flag. This flag is supposed to make regexp return (if it can't find a complete match) the first possible position at which a match could occur. However, it returns positions at which it is quite obvious no match could ever occur! For example: set pat {.x} set line "asd asd" # can match the last char, if followed by x set res [testregexp -xflags -- c $pat $line resvar] lappend res $resvar Should return '0 6' to say there is no match (0) but a match could possible occur at position 6 (with '.' matching the final 'd', if an 'x' was to be appended to the string). Unfortunately it returns '0 0' which is quite wrong. | |||
User Comments: |
tgl added on 2015-11-06 21:04:46:
I am not sure that there is any workable solution to this. The problem is that you're relying on the "cold start" position returned by shortest(), which is an implementation detail that should probably never have been exposed outside the regexp engine. It doesn't have any meaningful value for cases like this, because it reports on the last place where there was no active NFA state representing partial traversal of the regexp. Given a regexp starting with '.', every character can match that, and so every string position beyond start-of-string will have an active NFA state indicating that we successfully traversed the '.'. Now, unless the next character is 'x' we will fail to find any transition out of that NFA state so it gives rise to no active state after consuming the next character --- but meanwhile we matched '.' again, so there's a new instance of the same active NFA state instead. There's no way to get to a cold-start condition as long as incomplete matches overlap, or even just are adjacent. I don't presently see any way around that except to try to track the match-start position(s) associated with each active NFA state, which seems completely impractical, and certainly would be disastrous for string matching speed even if it could be done in a sane amount of memory. So AFAICS the cold-start logic is just a heuristic meant to reduce the amount of brute-force searching done in simpleFind() or complicatedFindLoop(), and it was a mistake to imagine that it had semantics useful to outside callers. vincentdarley added on 2003-10-16 20:08:51: Logged In: YES user_id=32170 One more subtlety here. A full implementation of this flag ought to provide two pieces of information: (i) if there was no match, where a match could occur if there was more text (ii) even if there was a match, whether the match could be longer (and possibly start earlier in the next) if there was more text The currently implementation handles (i) in a buggy way, and doesn't handle (ii) at all. vincentdarley added on 2003-04-11 23:34:55: Logged In: YES user_id=32170 Tcl now contains some 'knownBug' tests for this problem (reg.test). Since tip113 has now passed, and can easily exhibit this bug without the need to write C code, I'm upping the bug priority. vincentdarley added on 2003-03-14 23:52:29: Logged In: YES user_id=32170 No: the whole point of the TCL_REG_CAN_MATCH flag is to tell you where a match *might* be possible, if the string given was a bit longer. dkf added on 2003-03-14 23:46:49: Logged In: YES user_id=79902 Surely if you know where the end of the string is, you know that the match area can't go past it! Yes? vincentdarley added on 2003-03-14 23:24:51: File Added - 45078: tclTest.diff Logged In: YES user_id=32170 Attaching patch to tclTest.c vincentdarley added on 2003-03-14 23:23:36: File Added - 45077: reg.test.diff Logged In: YES user_id=32170 Attaching patch to reg.test |