Tcl Source Code

View Ticket
Login
Bounty program for improvements to Tcl and certain Tcl packages.
Ticket UUID: 536831
Title: Tcl_SubstObj: add precompilation
Type: RFE Version: None
Submitter: dgp Created on: 2002-03-29 18:25:20
Subsystem: 18. Commands M-Z Assigned To: dgp
Priority: 4 Severity:
Status: Closed Last Modified: 2003-03-13 09:41:28
Resolution: Duplicate Closed By: dgp
    Closed on: 2003-03-13 02:41:28
Description:
Lines 2442-2448 of tclCmdMZ.c (Tcl_SubstObj):

 *      This function performs the substitutions
specified on the
 *      given string as described in the user
documentation for the
 *      "subst" Tcl command.  This code is heavily
based on an
 *      implementation by Andrew Payne.  Note that if a
command
 *      substitution returns TCL_CONTINUE or TCL_RETURN
from its
 *      evaluation and is not completely well-formed,
the results are
 *      not defined.


I don't know what that last comment means, or if it
is even still valid.  Some examples would help, and
it this is an important special case, it should be
noted in subst.n.
User Comments: dgp added on 2003-03-13 09:41:28:
Logged In: YES 
user_id=80530

The patches for FR 684982 now
fully contain these.  Closing...

dgp added on 2003-03-13 09:08:13:
File Added - 44902: alternative.patch

Logged In: YES 
user_id=80530

I was wrong about being wrong. :)

Here's an alternative patch that
preserves more compatibility with
prior Tcl releases.  In particular,
only 4 tests in subst.test fail
after applying the alternative patch,
and those demonstrate behavior
that clearly has no value.

dkf added on 2003-03-11 18:57:06:
Logged In: YES 
user_id=79902

Ah!  So you agree with me, and want to go the route that
leads to precompilation.

The expensive bit's the parsing of commands, and it was
something I never really understood in detail enough.  But
if you can get it so that that is handled correctly, I can
then take that and produce the actual compilation (which'll
speed up repeated substitution of the same string...)

dgp added on 2003-03-11 04:33:07:
Logged In: YES 
user_id=80530


I was wrong.  There's really no straightforward
way to subst up to where the parsing error
occurs.  not without getting into a full
specification of how each kind of parsing
error would be treated.  If we have to 
invent a spec. anyway, best to invent the
simplest one:  any parsing error throws an
error before any substitution gets done.

dkf added on 2003-03-10 18:47:14:
Logged In: YES 
user_id=79902

I assume you're proposing that that test give a result of
   {1 0 {missing bracket}}
or something like that?  If so, I think that's an inevitable
result if doing the job properly and so the test should be
modified.  You might want to ask the opinion of other TCT
members on this though, and maybe also some other people
like Chang Li who (IIRC) expressed interest in the original
creation of Tcl_SubstObj()...

(Part of the problem is that I don't really use [subst] in
my own code.)

dgp added on 2003-03-09 05:33:29:
Logged In: YES 
user_id=80530

Please comment on the
incompatibility represented
by test subst-5.7.

The patch can be modified
so that subst-5.7 continues
to pass, by seeing that any
substitution side effects up to
the parsing error are still carried out.

Should it be so modified?

dgp added on 2003-03-08 03:42:53:
File Added - 44395: subst.patch

Logged In: YES 
user_id=80530


Here's a patch that completely
replaces Tcl_SubstObj() with
a new implementation that makes
use of the same (slightly modified)
internal parsing and substitution
engines as normal script evaluation.

The script-level difference that will
be seen is reflected in the 7 tests
in subst.test that fail after applying this
patch.  They fail because this patch
changes the way [subst] deals with
parsing errors.  As requested by
this Feature Request (and in agreement
with the way "-quoted words get substituted
in normal Tcl scripts), after applying this
patch parsing errors will be detected and
reported before anything else.

This patch also paves the way for a
larger cleanup in the inner evaluation
routines, making them smaller and simpler
(see FR 684982).

This patch also fixes Bug 685106.

By making better code re-use, this patch
also set the stage for any parsing
improvements for normal script evaluation
(for example, a Tcl_ObjType that caches
pased token sequences) to equally benefit
the [subst] command.

dkf added on 2002-11-16 19:05:48:
Logged In: YES 
user_id=79902

Need (optional?) capability to precompile subst-ed objects;
feature probably could do with flags to control whether it
happens or not as you'll take a performance hit for one-off
things even though it'll be better when repeated application
happens.

At the same time, adding scanning for a matching close
bracket would speed things up mightily and would simplify
other parts of the core (as you'd always be evaluating
complete scripts and not scripts with possible trailing junk.)

dkf added on 2002-04-18 21:18:59:
Logged In: YES 
user_id=79902

Partial fix; the comment should be clearer now. But the
underlying problem hasn't gone away.

dkf added on 2002-04-18 20:22:18:
Logged In: YES 
user_id=79902

The problem is that the code to recover from an exceptional
situation (i.e. continue) can't figure out where the script
portion of the string ends when that script is not
well-formed.  In a sense, the terminator for the script
actually lies beyond the end of the string it is embedded
in!  This is not good.  At this point, it either picks a
random square bracket that would otherwise have been part of
the script itself, or just uses the end of the command that
generated the exceptional condition, and while it would be
possible (I suppose) to characterise this behaviour, it
seems to me that this would be specifying a bug.

As I said below, the fix is to work out where the boundaries
of the command are before evaluating it.  That's best
combined with some kind of cached compilation step (you can
only cache if the subst flags are the same, of course.)

Nothing else in Tcl uses this tricky kind of simultaneous
parsing and evaluation.  (Not nowadays anyway.) 
Furthermore, it is only now that there is a way to tell it
occurred because before all exceptions stopped the
processing of the string.

dgp added on 2002-04-06 02:50:35:
Logged In: YES 
user_id=80530

I still don't think I get it.  Here's what I see in
Tcl 8.3.4:

% subst {abc[continue;[foobarspong}
invalid command name "foobarspong"
% subst {abc[continue;[foobarspong]}
abc
% subst {abc[continue;[foobarspong]]}
abc

And here's what I see in the HEAD:

% subst {abc[continue;[foobarspong}
missing close-bracket
% subst {abc[continue;[foobarspong]}
abc
% subst {abc[continue;[foobarspong]]}
abc

In both cases, the first is an error, and
the latter two show the effect of [continue]
in a [subst].

Where's the "not defined" here?  Is that so we
can change the behavior later?  What do we want
to change it to?

dkf added on 2002-04-05 15:45:07:
Logged In: YES 
user_id=79902

Example that exhibits the problem:
   subst {abc[continue;[foobarspong}

The problem is that there is currently no pre-parsing stage
involved in Tcl_SubstObj (I thought about adding one, but
didn't) so scanning forward on a command substitution to
find the end of the command substitution (which would more
accurately be described as a script substitution, BTW) is
only done when needed.  Now, for each of the evaluation
return codes, the following is the behaviour:
  TCL_OK: insert string from interp
  TCL_ERROR: error!  :^)
  TCL_BREAK: return everything done up to the start of the
             command substitution
  TCL_CONTINUE: insert nothing and go from end of command
                substitution
  TCL_RETURN: as with TCL_OK
The problem with TCL_CONTINUE and TCL_RETURN stems from the
fact that they can be issued from the middle of a script,
yet their behaviour really requires scanning to the end of
the script (TCL_OK is not a problem, since that only ever
comes from the end of the script.)  If the script being
evaluated is not well-formed, the code that does the
scanning forward (located somewhere inside the
implementation of Tcl_EvalEx) does not operate properly, and
I'm not sure how to change this (or even what the recovery
ought to be.)

As it is; the problem is a deep and nasty parsing problem.
The comment just documents the fact, and the problem isn't
exposed elsewhere because nowhere else in the core needs to
access parts of a string following an ill-formed command.

Suggested fix: compile [subst] strings so you can apply the
well-formedness checks first (and gain a performance boost
on objects that get fed into [susbst] several times too.)

Attachments: