TIP 460: An Alternative to Upvar

Login
Author:         Don Hathway <[email protected]>
State:          Draft
Type:           Project
Vote:           Pending
Created:        08-Dec-2016
Post-History:   
Keywords:       Tcl,variable,link,upvar
Tcl-Version:    9.1
Tcl-Branch:     dah-proc-arg-upvar

Abstract

Variable linking with the upvar command is not as intuitive or effecient as it should be. This TIP proposes an alternative through automatic variable linking.

Rationale

The current strategy used to link a variable in a called procedure to the caller, is to pass the name of the variable to the procedure, and use the upvar command to create a new variable, which is then linked to the original. Thus linking to a variable requires two components; the variable name and a newly created variable.

It is possible to instruct Tcl to do this linking automatically in an idiomatic way and dispense with the upvar command call.

Also, the requirement (by upvar) that the name of the new link variable be a different name from the original is arguably considered counter-intuitive.

Benefits to this TIP as proposed:

  1. No code to perform explicit linking within the procedure's body. Unlike upvar, this method requires no additional code to be entered in the body of the procedure. Less code, less bugs, easier to use! It has been said that Tcl'ers should make more use of variable linking in their code. Making it easier for them should have an encouraging effect, similar to how most Tcl'ers prefer $var over set var.

  2. Clearly defines links in the procedure's parameter list. Readers should instantly know what the links are. Clarity is important, especially for people that read code all day. There are no special project naming conventions to follow. A reader doesn't have to rely on docs or assume that a parameter name of "varName", "_var", or "varLnk" is to be linked by an upvar call, of which may be pushed down in the procedure's body by comments or other code.

  3. Alleviates arguably messy upvar chain linking.

Upvar Chaining Example

Below are three _upvar_s with the same arguments. As you can see, there is quite a bit of arguably unnecessary code duplication, and that is bug prone.

 proc foo {a} {
    upvar 1 $a la
    maybe do something with la
    bar la
 }
 proc bar {a} {
    upvar 1 $a la
    maybe do something with la
    baz la
 }
 proc baz {a} {
    upvar 1 $a la
    maybe do something with la
 }
 foo begin

This could be written more succinctly:

 proc foo {*a} {
    maybe do something with a
    bar a
 }
 proc bar {*a} {
    maybe do something with a
    baz a
 }
 proc baz {*a} {
    maybe do something with a
 }
 foo begin

Specification

Add support to procedure handling to allow for a parametric hint to procedure definitions with respect to the intent to link variables accordingly. We use the asterisk character "*" as the symbol to declare this intent; which shall prefix the parameter's name. Consequently, the "*" character becomes special, but only inside the procedure parameter list. A procedure definition using this facility would then have the signature:

 proc foo {*a *b} {...}

Where *a and *b are the procedure's parameters to be linked to the caller's arguments. New variables are then created for the future linking. In this example *a creates a new link variable named a, and likewise done for *b. *a and *b holds the values passed in by the caller.

The formal parameter's shall retain the same values provided by the caller.

The link variable's name shall always have one * symbol less than its counterpart parameter, for the sake of consistency. In example, a parameter named ***a shall have a counterpart link variable named **a. Similarily **a shall have a counterpart link named *a.

Where there are duplicate link parameter names (i.e. proc P {*a *a}) the behavior shall be the same as if there were duplicate upvar statements.

It is legal to have empty link variable names. It shall be possible with a single * in the procedure's parameter list (i.e. proc P {*} {incr ""}). The same duplicate name rule applies.

If the variable to be linked does not exist, it shall be created, if necessary. It shall have the same behavior as upvar 1 in such instances.

When a link's construction fails, the behavior shall be the same as if upvar had failed, the procedure will return with an error before any other commands (with exception to any commands involved in the link's construction) in its body are executed.

It is illegal for a link parameter to have a default value. It shall invoke an error during procedure creation time and result in failed procedure creation with the error code:

 Tcl_SetErrorCode(interp, "TCL", "OPERATION", "PROC","FORMALARGUMENTFORMAT", NULL);

An example of such an error for: proc P {{*a foo}} {...} Would be: "procedure "P": formal parameter "*a" is to be linked and must not have a default value"

In that example, proc P is never created, the attempt failed due to the error.

It is the caller's responsibility to provide the names of variables to be linked. This constraint exists in the spirit of promoting good coding practices and to help avoid obscure and subtle bugs. For the same reasons, this TIP only searches one level up. Therefore, It shall have the same behavior as upvar 1.

*args is a valid parameter name. For example, args is simply a link in:

 proc foo {a *args} {
     incr args
 }

Note that as of this TIP proc foo {args args} {...} is legal Tcl. In this instance only the first scalar args is usable by the procedure. The rest of the arguments are inaccessible by the script. They're not internally lost, but Tcl's variable lookup mechanics will choose whichever is found first when a script references it. This behavior is inherited for proc foo {*args args} {...}. Where args will be a link.

To further illustrate this proposal with an example:

 proc foo {*a *b} {
   bar a b
 }

 proc bar {*a *b} {
   incr a
   incr b
 }

 set v1 0
 set v2 1
 foo v1 v2
 puts $v1
 # prints 1
 puts $v2
 # prints 2

 # Version of foo using upvar:
 proc foo {a b} {
    # Note, upvar $a a would be an error.
    upvar 1 $a la $b lb
    bar la lb
 }
 proc bar {a b} {
    upvar 1 $a la $b lb
    incr la
    incr lb
 }

The "*" character was chosen primarily because it resembles a star or a snowflake and has a pleasantry to it. It is one of the few ascii characters that sticks out from its surrounding text.

It is also familiar to users of other languages where the same symbol exhibits similar semantics (to wit: a link in Tcl acts as a reference to another variable and doesn't perform a copy when the reference is written to, as it would if it weren't a link). However, unlike other languages, the Tcl core does not expose operations to user scripts that work directly on memory, so the "*" character should not be mistaken to behave the same or suffer from the same pitfalls as it does in C, C++, Golang, etc. The * symbol simply instructs Tcl to create a link if it is able to do so.

Consequences

  1. Breaks scripts using the special "*" as the first character in their procedure's parameters (i.e. *var).

    The impact of this should be minimal because these variable names require the user to wrap it in curly braces (i.e. ${*var}) to fetch their values, unless they're using the less common form of set varname.

Reference Implementation

See branch dah-proc-arg-upvar

Implementation Notes

tclInt.h: Add a new field named numArgsCompiledLocals to the Proc struct. The new field holds the number of parameters along with any other relevant local variables which follow immediately after the parameters. For this TIP, these additional locals are variables with the VAR_LINK flag and to be resolved as links to the values of arguments they've been configured to link with.

The additional field was a hard choice, but is necessary because TclProcCompileProc enforces procPtr->numCompiledLocals to be the same value as procPtr->numArgs. The local variable table is evidently not growable until later.

tclProc.c: Modify InitArgsAndLocals to do the automatic linking. Note that this is a very hot function and that was kept in mind while making the necessary adjustments. There are two additional branches in the function (the second only visited when an error happens). The first to check if the command has any parameters that need linking and if so, process them with link support handling code. The second branch is to simply check if the link handling code set an error when an error occurs, so this branch should not be a concern as to performance impact. Due to branch prediction and this function being so hot, there should be virtually nil of a performance impact on any code which doesn't make use of the new automatic linking facility.

tclProc.c: Modify TclCreateProc to add additional locals after the list of parameter locals (if any) when there are parameters flagged for auto linking.

Copyright

This document has been placed in the public domain.