Artifact [23ba88dccf]

Login

Artifact 23ba88dccf10794eb9ba6a67f0efaf53372f3128f1157c5a834f32a38c14ca1d:


# TIP 401: Comment Words with Leading {#}
	Author:         Lars Hellström <[email protected]>
	State:          Draft
	Type:           Project
	Vote:           Pending
	Created:        29-Apr-2012
	Post-History:   
	Tcl-Version:    9.1
	Tcl-Branch:     tip-401
-----

# Abstract

The basic syntax rules of Tcl \(the "dodekalogue"\) are modified to allow words
that are comments. In analogy with the argument expansion **\{\*\}**, such
comment words will begin with **\{\#\}** \(left brace, hash sign, right brace\).  
The change concerns both words in scripts and, more importantly, words in lists.

# Rationale

Tcl is special in that comments appear at the "statement" \(command\) level of
the language syntax rather than the "token" level \(as is the case in e.g. the
ALGOL language family: C, Pascal, Java, etc.\). This means a Tcl program has
fewer places in which a comment can be placed than many other languages, but
this has not been a serious problem in traditional Tcl programming as commands
tend to be short \(except when they have arguments that are themselves Tcl
scripts\) and places where a comment can be inserted thus frequent enough.
Recent developments in the language have however changed this.

Various new features -- particularly dictionaries, ensembles, and argument
expansion \(all of which were introduced with Tcl 8.5\) -- have encouraged a
coding style where occasionally fairly large blocks of code can be spent
setting up data structures as explicit values. For example, an ensemble that
relies on the **-map** option typically has the entire mapping dictionary as
a braced word in the code, and this can grow rather large if subcommands are
being mapped to command prefixes of length greater than one. **string map**
mappings can grow very large if there are many cases to deal with.
Dictionaries have greatly simplified the use of values with inner structure,
and as a result the complexity of the data that routinely gets passed to
package commands has increased; a controlling data structure \(e.g. a grammar,
in the case of a parser\) that once required an entire API of constructors to
build might now have been redesigned as a simple nested dictionary that can be
written raw in the code. And on top of it all, the various examples given
above are not excluding each other, but may rather nest to form even larger
blocks of code without any part that is a script. Adding a class of comments
that can begin anywhere a word can begin undoes the forced "comment desert"
status of such code blocks, because words are syntactic units not just in
commands but also in lists, and all of the structured explicit values
mentioned above are syntactically either plain lists or lists with additional
restrictions.

A parallel development is that the gap between what some piece of Tcl looks
like at coding time and at runtime has begun to widen, because an increasing
number of APIs call for fragments of commands \(particularly command prefixes\)
rather than full commands or scripts. This change is often good from a
correctness and efficiency point of view, but can make code harder to maintain
on account of being more obscure; the evocation of runtime calls resulting
from a particular piece of code sometimes requires a considerable effort of
imagination, even if all APIs involved are well documented. Comment words that
appear in the position\(s\) where additional material is inserted at runtime can
help the mind here, by letting the eyes see in the code those things that will
be there when it is evaluated, e.g. rather than

	  socket -server [list ::some::handler $settings] $port

one might write

	  socket -server [
	      list ::some::handler $settings {#}chan {#}client {#}clientport
	  ] $port

to emphasize the fact that this **::some::handler** command takes those four
things as arguments, even though only one of them is present in the command
prefix.

For command words to not change the interpretation of any presently legal
syntactic construction, they must be something which is not valid as a list
element word. The brace-something-brace-something syntax region where argument
expansion was given a home is pretty much the only possibility there is for
this \(if one rules out unbalanced words\), since there is very little that the
Tcl parser finds outright wrong. The **\#** character is the normal way of
starting a command-level comment, so it is natural that it occurs also in the
syntax of word-level comments.

# Specification

Clause 5 \(argument expansion\) of the Tcl.n manpage is to be amended with the
following conditions, and the language parser is to be modified accordingly.

 > If a word starts with the string "\{\#\}" followed by a non-whitespace
   character, then the leading "\{\#\}" is removed and the rest of the word is
   parsed and substituted as any other word. The result of this substitution
   is not used for anything, and no word is added to the command being
   substituted. For instance, "cmd a \{\#\}\{b c\} d \{\#\}e f" is equivalent to "cmd
   a d f".

Moreover, the analogous modification shall be made to the list parser; a word
with **\{\#\}** proper prefix is recognised as a comment also in a list, where
the initial substitution phase only performs backslash substitution.

_Note 1:_
The point of doing substitution is to stick as close to the behaviour of
**\{\*\}** as possible. Of the three steps involved in argument expansion -
parse and substitute word, reparse result without substitution as a list of
words, and append those words to command being built -- only the middle one
need to be different for **\{\#\}**, and thus a lot of the code can be shared.

_Note 2:_
The comment prefix is typically most useful with words like

	 cmd {#}{some [text]} {#}bareword {#}"Comment goes here"

but things like this are also legal:

	 cmd {#}$var {#}$x,\n {#}[foo bar]

_Note 3:_
It is _very_ important for serveral use-cases that comment words are
recognised as such also in lists. One might \(as I would\) argue that this
should really follow automatically, since outside the source itself the only
\(and not very explicit\) documentation of the string representation of lists is
found in the lindex.n manpage which merely says that:

 > In extracting the element, **lindex** observes the same rules concerning
   braces and quotes and backslashes as the Tcl command interpreter; however,
   variable substitution and command substitution do not occur.

As only variable and command substitution are mentioned as things which differ
between lists and commands, they therefore must treat **\{\#\}** the same.
However, presently the argument expansion **\{\*\}** is not recognised in
lists, despite there not being a stated exception for that either. \(This state
of affairs is reasonable, since it would be very complicated to extend present
mechanisms to support argument expansion in lists and the benefit of doing so
is slim at best, but it should be more clearly documented.\)

# Use Cases

## More comments in switch

Beginning with an old issue, one may consider the placement of comments to
**switch** cases. The current advice is to place them first in the bodies
\(which of course works\), but it can often be exposition-wise more natural to
place them before the pattern, especially if there is not a 1-1 correspondence
between comments and bodies.

	  switch -regexp [string trimleft $number +-] {
	 
	      {#}"Integer formats"
	      {^0$} - 
	      {^[1-9][0-9]*$} {
	           # ...
	      }
	      {^0o[0-7]+$} {
	           # ...
	      }
	      {^0x[0-9A-Fa-f]+$} {
	           # ...
	      }
	      
	      {#}"Float formats"
	      {^[0-9]*\.[0-9]+(e[+-][0-9]+)$} -
	      {^[0-9]+\.[0-9]*(e[+-][0-9]+)$} -
	      {^[0-9]+e[+-][0-9]+$}  {
	           # ...
	      }
	      
	  }

## Inline comments in long commands

Some commands can be very long simply because they require a lot of arguments
to express what one wants, and then comments can help clarify what a
particular argument contributes to.

	  $canvas create polygon 0 0 {#}"left top" 30 0 {#}"bend down" 30 30 \
	    {#}"concave part" 60 30 {#}"bend up" 60 0 {#}"right top" 90 0 \
	    {#}"curving back" 90 30 45 67 0 30 {#}"done" -smooth true \
	    -width 2 -fill orange -outline green {#}"Official colours" \
	    -tags {button {#}"for clicking" buoyant {#}"affects movement"}

If there are several ideas in such a command, it might be nice to put the
comments pertaining to each next to where that idea actually shows up in the
code.

## Inline argument descriptions

The Tcl style guide for C code suggests that comments describing function
arguments appear inline with the argument declarations. Comment words would
permit the same style in Tcl code.

	 proc tcl::Pkg::CompareExtension {
	    fileName      {#}"name of a file whose extension is compared"
	    {ext {}}      {#}"The extension to compare against; you must
	                      provide the starting dot. Defaults to the 
	                      info sharedlibextension."
	 } {
	    # ...
	 }

\(Whether that style would be regarded as an improvement or not probably
depends on one's taste.\)

## Filling in words that will be there at runtime

At one point, I found myself wanting to do some calculations with matrices
whose elements were polynomials \(with integer coefficients\) of four
noncommuting variables _A_, _B_, _C_, and _D_. Having previously
implemented some basic algebraic constructions, I could quickly set up a
command that implemented arithmetic with such matrices through the two
**interp alias**es

	 interp alias {} Z<A,B,C,D> {} \
	   ::mtmtcl::rings::semigroup_algebra ::mtmtcl::rings::integers::all\
	     ::mtmtcl::groups::string_free_monoid
	 interp alias {} matrices {} \
	   ::mtmtcl::matprop::trivial dummy ::Z<A,B,C,D>

This is however not a very readable definition even if one is familiar with
the commands it uses \(and realises that there is nothing magical about the
command name **Z<A,B,C,D>**\), because the way that things appear in this
piece of code is visually quite different from the context in which they will
appear when evaluated. If instead comment words are inserted as visual
placeholders for the missing material, then the overall appearance becomes
much closer to that of a script calling these commands.

	 interp alias {} Z<A,B,C,D> {#}method {#}args {} \
	   ::mtmtcl::rings::semigroup_algebra {
	      ::mtmtcl::rings::integers::all {#}method {#}args
	   } {
	      ::mtmtcl::groups::string_free_monoid {#}method {#}args
	   } {#}method {#}args
	 interp alias {} matrices {#}method {#}args {} \
	   ::mtmtcl::matprop::trivial dummy {
	      ::Z<A,B,C,D> {#}method {#}args
	   } {#}method {#}args

## "K combinator"

In the Tcl community, the _K combinator_ idiom is when you first produce an
argument for the main command \(through variable or command substitution\), then
evaluate some other command which has beneficial side-effects but whose result
is of no interest, and finally evaluate the main command.  \(The original K
combinator, as found combinatory logic, is more about getting rid of unwanted
arguments supplied to a command prefix than about exploiting side-effects, but
there is a continuum connecting the two.\)  This is most often employed to have
a variable release its reference to a Tcl\_Obj, so that the latter becomes
unshared and possible for a command to modify directly. **\{\#\}** introduces
the new form

	 set stack [lreplace $stack {#}[set stack whatever] end end]

of this, providing yet another alternative to such old forms as

	 set stack [lreplace $stack [set stack end] end]
	 set stack [lreplace $stack[set stack ""] end end]
	 set stack [lreplace $stack {*}[set stack ""] end end]

Of course, any

	 foo $apa {#}[bar baz]

can trivially be rewritten to achieve the same effect using argument expansion
instead:

	 foo $apa {*}[bar baz; list]

## Commenting out list elements

If a programmer needs to temporarily disable some functionality, then a
standard technique is to comment out the corresponding code. However, if the
code that needs to be commented out amounts to some elements in a long list
\(e.g., a list of commands to [interp expose] in a slave interpreter\) then
there is presently no way of commenting out less than the whole command
containing that list. Comment words provide a more specific alternative.

	   set tclCommands {
	       after append array binary break case catch cd clock close concat
	       continue dde else elseif encoding eof error eval exec exit expr
	       fblocked fcondict figure fcopy file fileevent flush for foreach 
	       format gets glob global history if incr info interp join lappend
	       lassign lindex linsert list llength load lrange lrepeat lreplace
	       lsearch lset lsort namespace open package pid proc puts pwd read
	       regexp regsub rename resource return scan seek set slave socket
	       source split string subst switch tell time trace unknown
	       unset update uplevel upvar variable vwait while
	       {#}{ {#}"Auto-loaded commands"
	       auto_execok auto_import auto_load auto_mkindex auto_mkindex_old
	       auto_qualify auto_reset parray pkg::create pkg_mkIndex tcl_endOfWord
	       tcl_findLibrary tcl_startOfNextWord tcl_startOfPreviousWord
	       tcl_wordBreakAfter tcl_wordBreakBefore
	       }
	   }

The same is true in dictionaries.

	 namespace eval ::tcl::info {
	     namespace ensemble create -command ::info -map {
	        exists             ::tcl::info::exists 
	        globals            ::tcl::info::globals 
	        locals             ::tcl::info::locals 
	        vars               ::tcl::info::vars 
	        args               ::tcl::info::args 
	        body               ::tcl::info::body 
	        default            ::tcl::info::default 
	        commands           ::tcl::info::commands 
	        procs              ::tcl::info::procs 
	        functions          ::tcl::info::functions 
	        cmdcount           ::tcl::info::cmdcount 
	        complete           ::tcl::info::complete 
	        script             ::tcl::info::script 
	        level              ::tcl::info::level 
	        frame              ::tcl::info::frame 
	        errorstack         ::tcl::info::errorstack 
	        patchlevel         ::tcl::info::patchlevel 
	        tclversion         ::tcl::info::tclversion 
	        {#}{ {#}"Commented out for bug #nnnnnn."
	        hostname           ::tcl::info::hostname 
	        sharedlibextension ::tcl::info::sharedlibextension 
	        loaded             ::tcl::info::loaded 
	        library            ::tcl::info::library 
	        nameofexecutable   ::tcl::info::nameofexecutable 
	        }
	        coroutine          ::tcl::info::coroutine 
	        object             ::oo::InfoObject 
	        class              ::oo::InfoClass
	     }
	 }

## Line continuation

Comment words also provide an alternative to backslash--newline line
continuations, namely as in

	  a command which continues well beyond the normal line width {#}{
	  } and which one therefore might want to split over two or more {#}{
	  } lines of code

Being several times longer than the simple backslash, this is unlikely to
replace it, but there could be cases where one wants to avoid the backslash
because that character would be intercepted by something else.

# Alternatives

Using the author's **docstrip** package
<http://tcllib.sourceforge.net/doc/docstrip.html> , one can place comments
between any two lines of code \(whether there is a command separator there or
not\) and also comment out arbitrary code lines, even if that comes at the
price of working with sources that are not raw Tcl code. However, such
comments have a tendency to become more of a separate commentary track than an
integrated part of the program narrative, and sometimes \(for example to show
things that are invisible in the code\) one specifically desires comments to be
an integral part of the Tcl code.

It is also possible to use **regsub** or something similar to the end of
removing comments from a block of code before it is evaluated or put in a
variable; instead of having the core language recognise some pieces of code as
comments, one preprocesses the code as a string before telling the core that
it actually is Tcl code \(or a list/dict/etc.\). Thus instead of \(assuming 
comment words\):

	   set tclCommands {
	       after append array binary break case catch cd clock close concat
	       continue dde else elseif encoding eof error eval exec exit expr
	       fblocked fcondict figure fcopy file fileevent flush for foreach 
	       format gets glob global history if incr info interp join lappend
	       lassign lindex linsert list llength load lrange lrepeat lreplace
	       lsearch lset lsort namespace open package pid proc puts pwd read
	       regexp regsub rename resource return scan seek set slave socket
	       source split string subst switch tell time trace unknown
	       unset update uplevel upvar variable vwait while
	       {#}{ {#}"Auto-loaded commands"
	       auto_execok auto_import auto_load auto_mkindex auto_mkindex_old
	       auto_qualify auto_reset parray pkg::create pkg_mkIndex tcl_endOfWord
	       tcl_findLibrary tcl_startOfNextWord tcl_startOfPreviousWord
	       tcl_wordBreakAfter tcl_wordBreakBefore
	       }
	   }

one might write

	   set tclCommands [regsub -all {#[^\n]*\n} {
	       after append array binary break case catch cd clock close concat
	       continue dde else elseif encoding eof error eval exec exit expr
	       fblocked fcondict figure fcopy file fileevent flush for foreach 
	       format gets glob global history if incr info interp join lappend
	       lassign lindex linsert list llength load lrange lrepeat lreplace
	       lsearch lset lsort namespace open package pid proc puts pwd read
	       regexp regsub rename resource return scan seek set slave socket
	       source split string subst switch tell time trace unknown
	       unset update uplevel upvar variable vwait while
	       ## Auto-loaded commands
	       # auto_execok auto_import auto_load auto_mkindex auto_mkindex_old
	       # auto_qualify auto_reset parray pkg::create pkg_mkIndex tcl_endOfWord
	       # tcl_findLibrary tcl_startOfNextWord tcl_startOfPreviousWord
	       # tcl_wordBreakAfter tcl_wordBreakBefore
	   } \n]

A problem with the latter is that it destroys code line correspondences for
[[280]](280.md). The boilerplate code for performing the preprocessing may also need to
be inserted far from the actual comment, which discourages use of commenting
using such a mechanism. But the main problem with it is that quick fixes like
this have a tendency to be half-baked, and would break some otherwise legal
code. \(Everything works fine in the example until the day someone needs to
insert an element which contains the \# character into the list.\)

Since the canonical list-quoting of **\#** is precisely **\{\#\}**, one could
\(in analogy with the argument against **\{\}** as expansion prefix\) argue that
**\{\#\}something** is too likely to arise as a typo for **\{\#\} something**,
which would suggest using some other character than **\#** in the comment
prefix. **\\\#** is however a shorter way to list-quote **\#**, so it seems
unlikely that **\{\#\}** should be common in manually written code \(unlike
**\{\}**, which is very common\).

## Additional alternatives from tcl-core discussion

That \(in script words\) the initial substitution round performs variable 
and command substitution is by several seen as problematic. Only 
performing backslash substitution there is no great loss, as the only 
use-case above which relies on other substitutions is the K combinator one 
that anyway has several alternatives already, but it complicates the 
implementation by introducing a new parsing mode for scripts \(grab word 
without variable or command substitution, but with backslash substitution\). 
In particular, one would have to decide on the exact rules for this; it 
would probably be best to use the same rules as for command words in 
lists, which however implies that

	 list {#}[foo bar]

would produce a list with one element, namely the string "bar]".

A different way of fitting comments into the 
brace-_something_-brace-_something_ syntax region would be to put the 
comment material in the first _something_ rather than the second. This 
could take the form of comment words which begin with **\{\#** and end 
with **\}\#**, as in

	  $canvas create polygon 0 0 {# left top }# 30 0 {#bend down}# 30 30 \
	    {# concave part}# 60 30 {#bend up }# 60 0 {# right top }# 90 0 \
	    {# curving back }# 90 30 45 67 0 30 {# done }# -smooth true \
	    -width 2 -fill orange -outline green {# Official colours }# \
	    -tags {button {# for clicking }# buoyant {#affects movement}#}

This is similar to Cloverfield's current **\#\{** and **\}\#** comment 
delimiters, and perhaps also reminicient of C's **/\*** ... **\*/** and 
Pascal's **\(\*** ... **\*\)**. It should however be stressed that the 
nesting here would still be that of braces rather than brace\+hash 
combinations, so

	  {# a b { c }# d {# e f } g }#

would be one comment word, not two with a non-comment word **d** in 
between. Using this syntax could thus lead to false expectations about 
where a comment ends.

A technical disadvantage of **\{\#** and **\}\#** for comment word 
delimiters is that they make TclFindElement slightly more expensive than 
for the proposed **\{\#\}** prefix. The reason is that TclFindElement needs 
to peek at the word after the element it finds to check if that is a 
comment \(and if so scan past it and peek at the word after that too\). If 
the comment status of a word can be determined from whether it carries a 
**\{\#\}** prefix, then it is at worst necessary to peek at the first four 
characters, but if it matters how the word ends then it might be necessary 
to parse the whole of the next element in a list before being able to tell 
that it indeed was not a comment. From a usability point of view, this 
syntax lends less support for the argument placeholder use-case than the 
proposed prefix.

Donald G Porter has pointed out that this proposal has two functionally 
unrelated parts, which indeed are easy to discern in the reference 
implementation: comment words in lists and the like \(changes in 
TclFindElement\) and comment words in scripts \(all other changes\). It would 
be possible to propose implementing either one of these parts alone, and 
the one which is most interesting is then comment words in lists. \(Given 
just that, one could even use

	   {*}{{#}"Comment goes here"}

as an ugly substitute for word comments in scripts.\) However, I believe it 
is more natural to introduce both together.

# Suggested Stylistic Considerations

Like ordinary words, comment words come in three varieties: brace-delimited,
quote-delimited, and barewords. A style rule which has been used in the above
examples is that:

 * Natural language comment words are of the quote-delimited kind.  This is
   for consistency with the common practice of quote-delimiting text strings
   within Tcl code, even if said string does not require substitution.

 * Comment words that serve as placeholders for actual program code are of the
   bareword variety, e.g. **\{\#\}method**.

 * Comment words which serve a programmatic purpose, essentially that of the
   "K combinator", are barewords too.

 * Comment words which do not fall into any of the above categories, for
   example those used to comment of a block of words, are brace-delimited.

One reason for having a style rule about this is that prettyprinting and
syntax colouring utilities _might_ want to highlight these cases
differently. Arguably, syntax colouring of Tcl code tends to do _at least as
much harm_ as it does good since few engines are able to keep track of enough
context to found their claims about the code on reasonably accurate
interpretations thereof, but comment words is one of the few things that they
might actually manage to get right most of the time, precisely because a
comment word is a comment word throughout so many contexts.

# Reference Implementation

A reference implementation is available as SF Tcl patch \#3522426
<https://sourceforge.net/support/tracker.php?aid=3522426> . It includes some
tests, but more could be needed. DGP has also created a branch 
**tip-401** <https://core.tcl-lang.org/tcl/timeline?r=tip-401>  for it in the 
core fossil respository; further development is best conducted in the 
latter.

Implementing comment words in lists and the like could be achieved by
modifications only in TclFindElement. Comment words in scripts are implemented
as argument expansion that does not contribute any word.

# Copyright

This document has been placed in the public domain.