Author: Kevin B. Kenny <[email protected]>
State: Draft
Type: Project
Vote: Pending
Created: 25-Oct-2017
Tcl-Version: 9.1
Keywords: assertion, pragma, type, alias, compilation
Post-History:
Tcl-Branch: tip-480
Abstract
This TIP proposes a new ensemble in the ::tcl
namespace, ::tcl::pragma
,
that will provide a place to install commands that make structural assertions
about Tcl code. Initially, two subcommands will be provided: ::tcl::pragma
type
, which asserts that Tcl values are lexically correct objects of a given
data type, and ::tcl::pragma noalias
, which describes the possible aliasing
relationships among a group of variables. The assertions are provided in an
ensemble, so that the set of available assertions can be expanded in the
future as additional opportunities are discovered to make useful claims about
program and data structure.
Motivation
Tcl, of course, is a typeless language: every value is a string. Moreover, it is an intensely dynamic language: the association of names with commands and variables is made very late, sometimes only when code is executed that searches for a variable by name.
Nevertheless, often a programmer's intention is to have values from a restricted set of strings, or to make restrictions on what names may address what variables. For instance, it may be known that a given piece of code is prepared to accept only numeric data, well-formed lists, Boolean values, or some other restricted type of data as its input.
Similarly, a great many programs that import variables using forms such as
global
, variable
, upvar
, namespace upvar
, and the custom variable
resolutions of systems like TclOO cannot function correctly if two or more of
their variable names actually designate the same variable. A procedure like
proc collect {inputVar} {
upvar 1 $inputVar inputs
variable collection
for {set i 0} {$i < [llength $inputs]} {incr i} {
lappend collection [lindex $inputs $i]
}
}
will surely yield surprising results if called with collection
as its
parameter!
Giving the programmer the capability to specify restrictions on data types and alias relationships would have multiple advantages:
It documents what is expected. In particular, procedure, method and lambda parameters can have assertions about their structure early in a procedure, informing callers what preconditions must be met.
It fails early. Rather than having mistaken values or unexpected aliases run some way into a procedure and then fail mysteriously or even silently, it can yield an informative message at the first sign of a violated condition.
It aids with code optimization. While data type restrictions can be deduced by a compiler with considerable effort (1), making them explicit can still lead to more performant code. Alias restrictions are considerably harder to deduce, and the problem is Turing-complete in general. Unexpected aliases can be created at points in the program far remote from a procedure. Code like
uplevel #0 {upvar 0 ::path::to::variable ::some::other::thing}
will create an alias without any procedure accessing one or another of the variables being any the wiser.
Proposal
The ::tcl::pragma
ensemble will be added. Initially, it will have two
members: ::tcl::pragma type
and ::tcl::pragma noalias
.
tcl::pragma type
The ::tcl::pragma type
command will have the syntax:
::tcl::pragma type typeName $value1 $value2...
In this usage, typeName is a description of the acceptable type of the given values. The values will be checked for whether they are instances of the given type, and a run-time error will be thrown if any value is not. Initially, the following types will be supported:
boolean: Indicates that the value is a Boolean:
0
,1
,off
,on
,true
,false
,yes
,no
: in general, a value that will pass the test ofstring is boolean -strict.
int32: Indicates that the value is an integer, small enough to fit in a C
int
value on the current platform.int64: Indicates that the value is an integer, small enough to fit in a
Tcl_WideInt
value on the current platform.integer: Indicates that the value is an integer, without constraint on its size.
double: Indicates that the value is representable as a double-precision floating point number (including the special values for Infinity and Not-a-Number).
number: Indicates that the value is representable as a number, which is the union of values accepted as an integer and values accepted as a double.
list: Indicates that the value is representable as a Tcl list. The elements of the list are not constrained.
dict: : Indicates that the value is representable as a Tcl dictionary. The keys and values of the dictionary are not constrained.
It is anticipated that further TIP's will be proposed that expand the available set of types. In particular, lists and dictionaries with constrained content types are foreseen as being useful things to include.
Note that this command operates on values, not variables. A command like:
::tcl::pragma type int $a
does not declare that a
is an integer variable, and does not require future
assigmnents to it to have the given type. It merely asserts that at the
current point in the program, the value of a
will be an integer small
enough to fit in a C int
.
One may think of this assertion as syntactic sugar for the longer codeburst:
if {![string is integer -strict $a]} {
return -code error -level 0 "expected an integer but got $a"
}
and in fact the bytecode compiler will be free to compile that, or similar code. (The description is slightly oversimplified, since other error options must also be manipulated.)
tcl::pragma noalias
The syntax for the ::tcl::pragma noalias
command shall be:
::tcl::pragma noalias set1 set2...
In this usage, set1
, set2
, ... are lists of variable names. The syntax
expresses the assertion that variables that are mentioned in the call are not
aliases of each other at the time the command is executed, except that
variables in the same set are permitted to alias.
The most common usage will be simply to use singleton sets. For instance, the
collect
procedure above might contain
::tcl::pragma noalias inputs collection
following the command
upvar 1 $inputsVar inputs
This command would have the effect of asserting that inputs
and collection
designate distinct variables, avoiding strange behaviour of modifying the
inputs while an iteration is in progress.
It is possible for any combination of aliases to be permitted by including the
possibility on the command line. For instance to assert that a
may be an
alias of b
or c
, but b
and c
must not alias each other, the command:
::tcl::pragma noalias {a b} {a c}
might be used. (The program could specify, redundantly, b
and c
on the
command line, but the noalias
command will enforce that any variable
mentioned anywhere in its arguments is not aliased to any other, except as
specified.
As a final note, it is anticipated that
::tcl::pragma noalias {*}[info locals]
will be a common usage - most programs do not tolerate any unexpected aliasing at all. It is therefore further anticipated that this specific usage may receive special handling in the implementation.
As with type
, noalias
is an assertion of the state of the program at a
given point in the flow of execution. It does not establish a permanent
constraint. A subsequent command such as upvar
may change the aliasing
relation, and there will be no prevention of such a change.
It is worth noting that the necessary interfaces to implement this command are not yet available at the Tcl level at all. A Tcl script has no easy way to determine whether one variable is an alias for another. This command has no counterpart in today's Tcl.
A quick view may lead one to suspect that noalias
will require quadratic
time to check the relationships at runtime. In at least the common cases,
though, it is to be expected that noalias
will run in time O(N), where N
is the number of included variables. Instead of comparing all pairs, it will
be easier to maintain a hash table of variable addresses, and check for
collisions by looking for existing hash entries.
Discussion
The Naming of Names
An appropriate name for this ensemble is a difficult choice. A very early
draft of this proposal, circulated privately, suggested ::tcl::assume
(since
it was seen as a claim that it is safe for a compiler to make a given
assumption). This name was roundly rejected by the reviewers. An alternative
that was counterproposed was ::tcl::assert
. The disadvantage to the latter
name is that it is easy to imagine a piece of code wanting to namespace
import
both ::tcl::assert
and ::control::assert
leading to a name
collision. Moreover, ::tcl::assert
does not take a Boolean expression but
rather a different sort of expression of a constraint. The similarity of the
names would therefore be confusing. In names, as in many other aspects of
life, "the good ones are already taken."
Runtime Behaviour
The assertions described in this TIP are not without cost at runtime. In an
interpreted environment, it may be desirable to control, on a per-namespace
basis, whether the assertions are enforced. In a compiled environment, many of
these assertions will either enable more aggressive optimization, be removable
themselves with appropriate analysis to prove they are unnecessary, or
both. For this reason, the proponent wishes to consider enabling and disabling
of structural assertions to be Out Of Scope at the present time. If it does
prove to be necessary, it can be done with a mechanism analogous to the way
that today's ::control::assert
works.
References
- Kenny, Kevin B. and Donal K. Fellows. 'The State of Quadcode 2017.' Proc. 24th Annual Tcl/Tk Conf. Houston, Tex.: Tcl Community Association, October 2017. https://core.tcl-lang.org/tclquadcode/raw/doc/tcl2017/kbk-dkf-state-of-quadcode.odt?name=ed72b79c8b