Tcl Improvement Proposals: Artifact [88105220ad]

Artifact 88105220ad60c7ea2e68e33b3889841d213633bf37b6fdc1d4b62969a3b524da:

File tip/90.tip — part of check-in [669e4cfb41] at 2003-09-04 14:43:33 on branch trunk — Correction to the [myReturn] proc example. (user: dgp size: 19072)
TIP:            90
Title:          Enable [return -code] in Control Structure Procs
Version:        $Revision: 1.39 $
Author:         Don Porter <[email protected]>
Author:         Donal K. Fellows <[email protected]>
State:          Final
Type:           Project
Vote:           Done
Created:        15-Mar-2002
Post-History:   
Tcl-Version:    8.5

~ Abstract

This TIP analyzes existing limitations on the coding of control
structure commands as ''proc''s, and presents expanded forms of
''catch'' and ''return'' to remove those limitations.

~ Background

It is a distinguishing feature of Tcl that everything is a command,
including control structure functionality that in many other languages
are part of the language itself, such as ''if'', ''for'', and
''switch''.  The command interface of Tcl, including both a return
code and a result, allows extensions to create their own control
structure commands.

Control structure commands have the feature that one or more of their
arguments is a script, often called a ''body'', meant to be evaluated
in the caller's context.  The control structure command exists to
control whether, when, in what context, or how many times that script
is evaluated.  When the body is evaluated, however, it is intended to
behave as if it were interpreted directly in the place of the control
structure command.

The built-in commands of Tcl provide the ability for scripts
themselves to define new commands.  Notably, the ''proc'' command
makes this possible.  In addition, other commands such as ''catch'',
''return'', ''uplevel'', and ''upvar'' offer enough control and access
to the caller's context that it is possible to create new control
structure commands for Tcl, entirely at the script level.

Almost.

There is one limitation that separates control structure commands
created by ''proc'' from those created in C by a direct call to
''Tcl_Create(Obj)Command''.  It is most easily seen in the following
example that compares the built-in command ''while'' to the command
''control::do'' created by ''proc'' in the control package of tcllib.

|  % package require control
|  % proc a {} {while 1 {return -code error}}
|  % proc b {} {control::do {return -code error} while 1}
|  % catch a
|  1
|  % catch b
|  0

The control structure command ''control::do'' fails to evaluate
''return -code error'' in such a way that it acts the same as if
''return -code error'' was evaluated directly within proc ''b''.

~ Analysis

There are two deficiencies in Tcl's built-in commands that lead to
this incapacity in control structure commands defined by ''proc''.

First, ''catch'' is not able to capture the information.  Consider:

|   %  set code [catch {
|          return -code error -errorinfo foo -errorcode bar baz
|      } message]

After evaluation, ''code'' contains "2" (''TCL_RETURN''), and
''message'' contains "baz", but the other values are locked away in
internal fields of the ''Tcl_Interp'' structure as
''interp->returnCode'', ''interp->errorCode'', and 
''interp->errorInfo''.  The "-errorcode" and "-errorinfo" values
will be copied to the global variables "::errorCode" and 
"::errorInfo", respectively, but there will be no way at the
script level to get at the ''interp->returnCode'' value which
was the value of the original "-code" option.

Second, even if the information were available, there is no built-in
command in Tcl that can be evaluated within the body of a proc to make
the proc itself act as if it were the command ''return -code''.
Stated another way, it is not possible to create a command with
''proc'' that behaves exactly the same as ''return -code''.  Because
of that, it is also not possible to create a command with ''proc''
that behaves exactly the same as ''while'', ''if'', etc. - any
command that evaluates any of its arguments as a script in the
caller's context.

This is a curious, and likely unintentional, limitation.  Tcl goes to
great lengths to be sure I can create my own ''break'' replacement
with ''proc''.

| proc myBreak {} {return -code break}

It would be a welcome completion of Tcl's set of built-in commands to
be able to create a replacement for every one of them using ''proc''.

~ Specification

The ''return'' command shall have syntax:

| return ?option value ...? ?result?

There can be any number of ''option value'' pairs, and
any value at all is acceptable for an ''option'' argument.
The legal values of a ''value'' argument are limited for
some ''option''s, as follows:

 > the ''value'' after a "-code" must be either
   an integer (32-bit only), or one of the strings, "ok",
   "error", "return", "break", or "continue",
   just as in the 8.4 spec for ''return''.  The default ''value''
   for the "-code" option is "0".

 > the ''value'' after a "-level" must be a non-negative integer.
   The default ''value'' for the "-level" option is "1".

 > the ''value'' after a "-options" must be a dictionary ([111]).
   The default ''value'' for the "-options" option is an empty
   dictionary.

The keys and values in the dictionary ''value'' of the "-options"
option are pulled out and treated as additional ''option value''
arguments to the ''return'' command.  Note that this "-options" option
for option expansion is offered only because Tcl itself has no
syntax for argument expansion, as observed many,
many times before (for example, [103]).

The ''result'' argument, if any, is stored in the interp as the
result of the ''return'' command.  In default operation, this
becomes the result of the procedure in which the ''return'' command
is evaluated.

The return code of the ''return'' command is determined by the
''value''s of the "-code" and "-level" options.  If the ''value''
of the "-level" option is non-zero, then the return code of
''return'' is TCL_RETURN.  If the ''value'' of the "-level" option
is "0", then the return code of ''return'' is the ''value'' of the
"-code" option, translated from string, as needed.  In this way,

| return -level 0 -code break

is a synonym for

| break

while

| return -code break

spelled out with defaults filled in as:

| return -level 1 -code break

continues to function as before, causing the procedure in which
the ''return'' is evaluated to return the TCL_BREAK return code.

All ''option value'' arguments to ''return'' are stored in a
return options dictionary kept in the interp, just as the
''result'' argument gets stored in the result of the interp.

The TclUpdateReturnInfo() function is modified, so that each
level of procedure returning decrements the value of the "-level"
key in the return options dictionary.  When the value of the
"-level" key reaches "0", the return code from the current procedure
will be the value of the "-code" key in the return options dictionary.
Otherwise, the return code of the current procedure will be TCL_RETURN.

In this way,

| return -level 2 -code ok

is equivalent to

| return -code return

and should (absent some intervening ''catch'') cause a normal return
to the caller's caller.  Likewise,

| return -level 3 -code ok

would cause a normal return to the caller's caller's caller
(again absent an intervening ''catch''), something
that can't currently be accomplished.

The ''catch'' command shall have syntax:

| catch script ?resultVar? ?optionsVar?

The new argument ''optionsVar'', if present, will be the
name of a variable in which a dictionary of return options
should be stored.  The return options stored in that dictionary
are exactly those needed so that the evaluation of

| catch $script result options
| return -options $options $result

is completely indistinguishable (except for the existence
and values of variables "result" and "options") from the
direct evaluation of ''$script'' by the interpreter.  In
particular, any values of the "::errorCode" and "::errorInfo"
variables are the same as if there were never a ''catch'' in
the first place.

In addition, when the result of ''catch'' is TCL_ERROR, the
value in the ''errorLine'' field of the ''Interp'' struct
will be stored as the value of the "-errorline" key in the
return options dictionary.

This specification may seem a bit complex, but it makes possible
very simple solutions to the problems posed above.

~ Examples

First lets revisit the analysis:

|   %  set code [catch {
|          return -code error -errorinfo foo -errorcode bar baz
|      } message options]

After evaluation, ''code'' contains "2" (''TCL_RETURN''), ''message''
contains "baz", and now ''options'' contains:

| -errorcode bar -errorinfo foo -code 1 -level 1

So, the ''options'' variable now contains the information that
was previously inaccessible.  We can now

| return -options $options $message

to get the same results as if the ''catch'' had never been
there in the first place.

In 8.4 Tcl, it is not possible to implement a replacement
for the ''return'' command as a proc.  After this proposal,
such a replacement is:

| proc myReturn args {
|     set result ""
|     if {[llength $args] % 2} {
|         set result [lindex $args end]
|         set args [lrange $args 0 end-1]
|     }
|     set options [eval [list dict create -level 1] $args]
|     dict incr options -level
|     return -options $options $result
| }

In every way ''myReturn'' should be an equivalent to ''return''.

The new ability to exactly reproduce stack traces makes a
''catch'' of large scripts more attractive.  For example, a
procedure that allocates some resource, then performs operations,
and finally frees the resource before returning.  In order to
be sure the resource is freed, we must ''catch'' any errors
that might cause the procedure to return before the freeing
of the resource.  The solution looks like:

| proc doSomething {} {
|     set resource [allocate]
|     catch {
|          # Arbitrarily long script of operations
|     } result options
|     deallocate $resource
|     return -options $options $result
| }

With that structure, we are confident the resource is always
freed, but any error or exception will be presented to the
caller exactly as if it had never been caught in the first place.

Here are two examples of how to use the new features in a 
control structure proc.  The essence of a control structure
command is its ability to evaluate a script in the caller's
context, preserving the illusion that no additional stack
frame was ever used.  So, a proc replacement for ''eval''
illustrates the technique.

The first approach assumes one knows
the internal details of how the ''uplevel'' command adds to
the stack trace. This is straightforward, but will require a
rewrite if ''uplevel'' ever changes how it manipulates the
stack trace.

| proc myEval script {
|     if {[catch {uplevel 1 $script} result options] == 1} {
|         set stack [dict get $options -errorinfo]
|         regsub {\s+invoked from within\s+"uplevel 1 \$script"$} $stack {} stack
|         regsub {\("uplevel" body line (\d+)\)$} $stack [subst -nobackslashes \
|                 {("[lindex [info level 0] 0]" body line \1)}] stack
|         dict set options -errorinfo $stack
|     }
|     dict incr options -level
|     return -options $options $result
| }

A second, more robust solution is possible, but requires a bit
more context gymnastics.

| namespace eval control {
|     proc eval script {
|         variable result
|         variable options
|         set code [uplevel 1 \
|                 [list ::catch $script [namespace which -variable result] \
|                         [namespace which -variable options]]]
|         if {$code == 1} {
|             set line [dict get $options -errorline]
|             dict append options -errorinfo \
|                     "\n    (\"[lindex [info level 0] 0]\" body line $line)"
|         }
|         dict incr options -level
|         return -options $options $result
|     }
| }

Note that in the second solution we did not have to strip away the
contributions of ''uplevel'' to the stack trace, because we captured
the stack trace before ''uplevel'' added anything.  Then we could add
our own information (drawing in part on the new "-errorline" value
available to us now at the script level).

We confirm that either approach solves the original problem:

| % proc a {} {eval {return -code error}}
| % proc b {} {myEval {return -code error}}
| % proc c {} {control::eval {return -code error}}
| % catch a
| 1
| % catch b
| 1
| % catch c
| 1

Finally, the new features make possible a utility command that
can be of use to people making simple control structure commands,
or doing simple wrapping, where there is no need to augment the
stack trace, or to treat any return codes in a special way:

| namespace eval control {
|     proc ascaller script {
|         if {[info level] < 2} {
|             return -code error \
|                     "[lindex [info level 0] 0] called outside a proc"
|         }
|         variable result
|         variable options
|         set code [uplevel 2 \
|                 [list ::catch $script   [namespace which -variable result] \
|                                         [namespace which -variable options]]]
|         if {$code == 0} {
|             return $result
|         }
|         dict incr options -level 2
|         return -options $options $result
|     }
| }

Within a proc, ''ascaller $script'' will take care of all aspects
of evaluating ''$script'' in the caller context, and exiting as
appropriate for all non-TCL_OK return codes.

~ Extensibility

The ''return -code'' command has always accepted any integer value
as a valid argument, allowing package and application authors to
define their own new return codes as needed by their own control
structure commands.  Now that ''return'' will accept any ''option''
argument, and ''catch'' can capture all ''option value'' argument
pairs passed to the caught ''return'' command, package and application
authors now have the ability to augment their custom return codes
with additional data.  Some prefix convention should be established
to avoid key name conflicts in the return options dictionary.

~ Potential Concerns

Reviewers of drafts of this TIP wondered whether the new
"-level" option to ''return'' raised the possibility of
trouble with an attempt to return more levels than beyond
the top of the call stack.  

It should be understood that ''return -level N'' does not
take any shortcut past the intervening levels.  Each level
of the call stack gets a TCL_RETURN return code, and a "-level"
value, dropping by one each step up the stack.  Any level in
the stack might choose to ''catch'' the TCL_RETURN and treat
it as it wishes.  This is exactly the way the existing
''return -code return'' is handled.  Normally, it would cause
a normal return to the caller's caller, but if the caller
chooses to 'catch' it, then the caller has control.

At the toplevel we run out of callers.  Then the question becomes
how is a TCL_RETURN code at toplevel handled?

| % return -level 0       ;# same as a TCL_OK at toplevel
| % return -level 1       ;# same as [return]
| % return -level 2       ;# same as [return -code return]
| command returned bad code: 2

From the C level, ''Tcl_AllowExceptions()'' can be used to
modify this toplevel behavior.

The following proc will produce the same results as above, but
from any level in the call stack (absent an intervening ''catch''):

| % proc escape level {
|       set x [info level]
|       incr x $level
|       return -level $x
|   }
| % escape 0
| % escape 1
| % escape 2
| command returned bad code: 2

Another concern was whether this proposal gave slave interpreters
any new powers over their masters.  The return code from evaluation
of an untrusted script in a slave interpreter should always be
wrapped in a ''catch'' already, lest a TCL_ERROR in the script
blow the stack.  Given that, the only thing this proposal does is
give the ''catch'' command more information to use to decide
how to handle the misbehaving script.

~ Compatibility

It is the author's belief that this proposal is completely
compatible with prior Tcl 8.X releases.  Any error-free script
that ran before, should continue to run with the same results.
At the C level, only internal changes are made, and no new interfaces
are defined.  Any extension or embedding C program that sticks to the
public stubs interface should see no visible change.  

~ Prototype

This proposal is implemented by Tcl Patch 531640 at SourceForge.

The prototype covers all described functionality, but might be
further improved with more substantial bytecompiling of [return].

~ Future considerations

The main reason the global variables ''::errorInfo'' and
''::errorCode'' exist is to give the script level access to
stack and error code information following the ''catch''
of a script that raises an error.  After this proposal, the
''catch'' command itself provides access to that information,
so the global variables are not required.  One can imagine
deprecating them, asking users of Tcl 8.5 to stop writing
code that accesses them.  They could still have apparent
existence, to satisfy the needs of scripts written for earlier
Tcl 8.X releases, by means of read traces.  In time,
Tcl 9 could either continue the read trace scheme, or not
provide these global variables at all.

One part of Tcl itself that currently makes use of the
''::errorCode'' and ''::errorInfo'' variables is the
''bgerror'' command.  Currently, ''bgerror'' accepts exactly
one argument, the error message.  To make use of stack or
error code information, ''bgerror'' must retrieve them from
the global variables.  The proper values of these global
variables are re-set by ''Tcl_BackgroundError()'' prior to
evaluation of ''bgerror''.

As an alternative, ''Tcl_BackgroundError()'' could first attempt
to call ''bgerror'' with ''two'' arguments, first the message,
then a dictionary of options.  If that call returned TCL_ERROR,
then a second attempt could be made with a single message
argument.  In that way, cleaner ''bgerror'' commands that get
all data from arguments could be supported, while still keeping
support for those ''bgerror'' commands that were defined for
single argument use.

It has been noted several times that the processing of the
value of ''::errorInfo'' is rather difficult because it is
an arbitrary string with no documented structure.  A different,
more structured way of representing stack trace information would
be an improvement.  This proposal does not propose an alternative,
but because it offers an extensible dictionary for storing arbitrary
return options data, it does provide an infrastructure where such
approaches might be tried out.

~ Acknowledgments

This proposal is a synthesis of ideas from many sources.  As best
I can recall, major contributions came from Joe English, Andreas
Leitgeb, Reinhard Max, and Kevin Kenny.  If you like the idea,
give them some credit; it you don't, blame me for combining the
ideas badly.

~ See also

Documentation for tcllib's control package: 
http://tcllib.sf.net/doc/control.html

~ Copyright

This document has been placed in the public domain.