Tcl Source Code: View Ticket

Ticket UUID:	536831
Title:	Tcl_SubstObj: add precompilation
Type:	RFE	Version:	None
Submitter:	dgp	Created on:	2002-03-29 18:25:20
Subsystem:	18. Commands M-Z	Assigned To:	dgp
Priority:	4	Severity:
Status:	Closed	Last Modified:	2003-03-13 09:41:28
Resolution:	Duplicate	Closed By:	dgp
		Closed on:	2003-03-13 02:41:28
Description:	Lines 2442-2448 of tclCmdMZ.c (Tcl_SubstObj): * This function performs the substitutions specified on the * given string as described in the user documentation for the * "subst" Tcl command. This code is heavily based on an * implementation by Andrew Payne. Note that if a command * substitution returns TCL_CONTINUE or TCL_RETURN from its * evaluation and is not completely well-formed, the results are * not defined. I don't know what that last comment means, or if it is even still valid. Some examples would help, and it this is an important special case, it should be noted in subst.n.
User Comments:	dgp added on 2003-03-13 09:41:28: Logged In: YES user_id=80530 The patches for FR 684982 now fully contain these. Closing... dgp added on 2003-03-13 09:08:13: File Added - 44902: alternative.patch Logged In: YES user_id=80530 I was wrong about being wrong. :) Here's an alternative patch that preserves more compatibility with prior Tcl releases. In particular, only 4 tests in subst.test fail after applying the alternative patch, and those demonstrate behavior that clearly has no value. dkf added on 2003-03-11 18:57:06: Logged In: YES user_id=79902 Ah! So you agree with me, and want to go the route that leads to precompilation. The expensive bit's the parsing of commands, and it was something I never really understood in detail enough. But if you can get it so that that is handled correctly, I can then take that and produce the actual compilation (which'll speed up repeated substitution of the same string...) dgp added on 2003-03-11 04:33:07: Logged In: YES user_id=80530 I was wrong. There's really no straightforward way to subst up to where the parsing error occurs. not without getting into a full specification of how each kind of parsing error would be treated. If we have to invent a spec. anyway, best to invent the simplest one: any parsing error throws an error before any substitution gets done. dkf added on 2003-03-10 18:47:14: Logged In: YES user_id=79902 I assume you're proposing that that test give a result of {1 0 {missing bracket}} or something like that? If so, I think that's an inevitable result if doing the job properly and so the test should be modified. You might want to ask the opinion of other TCT members on this though, and maybe also some other people like Chang Li who (IIRC) expressed interest in the original creation of Tcl_SubstObj()... (Part of the problem is that I don't really use [subst] in my own code.) dgp added on 2003-03-09 05:33:29: Logged In: YES user_id=80530 Please comment on the incompatibility represented by test subst-5.7. The patch can be modified so that subst-5.7 continues to pass, by seeing that any substitution side effects up to the parsing error are still carried out. Should it be so modified? dgp added on 2003-03-08 03:42:53: File Added - 44395: subst.patch Logged In: YES user_id=80530 Here's a patch that completely replaces Tcl_SubstObj() with a new implementation that makes use of the same (slightly modified) internal parsing and substitution engines as normal script evaluation. The script-level difference that will be seen is reflected in the 7 tests in subst.test that fail after applying this patch. They fail because this patch changes the way [subst] deals with parsing errors. As requested by this Feature Request (and in agreement with the way "-quoted words get substituted in normal Tcl scripts), after applying this patch parsing errors will be detected and reported before anything else. This patch also paves the way for a larger cleanup in the inner evaluation routines, making them smaller and simpler (see FR 684982). This patch also fixes Bug 685106. By making better code re-use, this patch also set the stage for any parsing improvements for normal script evaluation (for example, a Tcl_ObjType that caches pased token sequences) to equally benefit the [subst] command. dkf added on 2002-11-16 19:05:48: Logged In: YES user_id=79902 Need (optional?) capability to precompile subst-ed objects; feature probably could do with flags to control whether it happens or not as you'll take a performance hit for one-off things even though it'll be better when repeated application happens. At the same time, adding scanning for a matching close bracket would speed things up mightily and would simplify other parts of the core (as you'd always be evaluating complete scripts and not scripts with possible trailing junk.) dkf added on 2002-04-18 21:18:59: Logged In: YES user_id=79902 Partial fix; the comment should be clearer now. But the underlying problem hasn't gone away. dkf added on 2002-04-18 20:22:18: Logged In: YES user_id=79902 The problem is that the code to recover from an exceptional situation (i.e. continue) can't figure out where the script portion of the string ends when that script is not well-formed. In a sense, the terminator for the script actually lies beyond the end of the string it is embedded in! This is not good. At this point, it either picks a random square bracket that would otherwise have been part of the script itself, or just uses the end of the command that generated the exceptional condition, and while it would be possible (I suppose) to characterise this behaviour, it seems to me that this would be specifying a bug. As I said below, the fix is to work out where the boundaries of the command are before evaluating it. That's best combined with some kind of cached compilation step (you can only cache if the subst flags are the same, of course.) Nothing else in Tcl uses this tricky kind of simultaneous parsing and evaluation. (Not nowadays anyway.) Furthermore, it is only now that there is a way to tell it occurred because before all exceptions stopped the processing of the string. dgp added on 2002-04-06 02:50:35: Logged In: YES user_id=80530 I still don't think I get it. Here's what I see in Tcl 8.3.4: % subst {abc[continue;[foobarspong} invalid command name "foobarspong" % subst {abc[continue;[foobarspong]} abc % subst {abc[continue;[foobarspong]]} abc And here's what I see in the HEAD: % subst {abc[continue;[foobarspong} missing close-bracket % subst {abc[continue;[foobarspong]} abc % subst {abc[continue;[foobarspong]]} abc In both cases, the first is an error, and the latter two show the effect of [continue] in a [subst]. Where's the "not defined" here? Is that so we can change the behavior later? What do we want to change it to? dkf added on 2002-04-05 15:45:07: Logged In: YES user_id=79902 Example that exhibits the problem: subst {abc[continue;[foobarspong} The problem is that there is currently no pre-parsing stage involved in Tcl_SubstObj (I thought about adding one, but didn't) so scanning forward on a command substitution to find the end of the command substitution (which would more accurately be described as a script substitution, BTW) is only done when needed. Now, for each of the evaluation return codes, the following is the behaviour: TCL_OK: insert string from interp TCL_ERROR: error! :^) TCL_BREAK: return everything done up to the start of the command substitution TCL_CONTINUE: insert nothing and go from end of command substitution TCL_RETURN: as with TCL_OK The problem with TCL_CONTINUE and TCL_RETURN stems from the fact that they can be issued from the middle of a script, yet their behaviour really requires scanning to the end of the script (TCL_OK is not a problem, since that only ever comes from the end of the script.) If the script being evaluated is not well-formed, the code that does the scanning forward (located somewhere inside the implementation of Tcl_EvalEx) does not operate properly, and I'm not sure how to change this (or even what the recovery ought to be.) As it is; the problem is a deep and nasty parsing problem. The comment just documents the fact, and the problem isn't exposed elsewhere because nowhere else in the core needs to access parts of a string following an ill-formed command. Suggested fix: compile [subst] strings so you can apply the well-formedness checks first (and gain a performance boost on objects that get fed into [susbst] several times too.)

Attachments:

alternative.patch [download] added by dgp on 2003-03-13 09:08:13. [details]
subst.patch [download] added by dgp on 2003-03-08 03:42:53. [details]