Tcl Improvement Proposals: 457.md at [da6e72fe32]

File tip/457.md artifact c710428c27 part of check-in da6e72fe32

# TIP 457: Add Support for Named Arguments
	Author:         Mathieu Lafon <[email protected]>
	Author:         Andreas Leitgeb <[email protected]>
	State:          Draft
	Type:           Project
	Vote:           Pending
	Created:        21-Nov-2016
	Post-History:   
	Keywords:       Tcl,procedure,argument handling
	Tcl-Version:    9.1
	Tcl-Branch:     tip-457
-----

# Abstract

This TIP proposes an enhancement of the Tcl language to support named
arguments and additional features when calling a procedure.

# Rationale

The naming of arguments to procedures is a computer language feature which
allow developers to specify the name of an argument when calling a function.
This is especially useful when dealing with arguments with default values, as
this does not require to specify all previous arguments when only one argument
is required to be specified.

As such, this is a commonly requested feature by Tcl developers, who have
created various code snippets <http://wiki.tcl.tk/10702>  to simulate it. These
snippets have drawbacks: not intuitive for new users, require to add extra
code at the start of each procedure, no standard on the format to use, few
errors handling, etc.

After discussing various possibilities with the community, it has been
decided to extend the argument specification of the _proc_ command
and allow users to define options on arguments. This can be used to
support named arguments but also add additional enhancements:
flag arguments, pass-by-name \(_upvar_\) arguments, non-required
arguments, ...

The others possibilities discussed are detailed in the _Discussion_
section at the end of the document.

# Specification

The _proc_ documentation currently define argument specifiers as a list
of one or two fields where the first field is the name of the argument and
the optional second field is its default value.

The proposed modification is to support an alternate specifier format where
the first field is also the name of the argument, followed by a paired list
of options and their values. This format does not prevent the original format
to be used as they can be easily distinguished: the new format uses an
odd size list with a minimal size of three fields.

## Available argument specifiers

The following argument specifiers are defined in this TIP:

 * **-default VALUE** defines the default value for the argument.
   It is ignored if _-required 1_ is also used.

		% proc p { { a -default {} } } { list a $a }
		% p
		a {}
		% p foo
		a foo

 * **-name NAME** defines the argument to be a named argument.
   NAME defines the name of the argument when it is  defined as a
   single string. If NAME is a list of strings, it is the list of
   names that can be used to refer to the argument \(i.e. aliases\).
   On the call-site, the name of the argument is prefixed by a single dash
   and followed by the value.

		% proc p1 { { v -name val } } { list v $v }
		% p1 -val 1
		v 1

		% proc p2 { { v -name {v val value} } } { list v $v }
		% p2 -value 2
		v 2
		% p2 -v 2
		v 2

 * **-switch SWITCHES** defines that the argument is defined on the
   call-site as a flag-only/switch parameter. SWITCHES is a list of
   possible switches. Each switch is defined either as a single string
   \(switch name\) or as a list of two entries \(switch name and related
   value\). On the call-site, the name of the switch is prefixed by a
   single dash and is not followed by any value. The value assigned to
   the argument is either the switch name or the related value depending
   on how it was defined.

		% proc p { { dbg -default 0 -switch debug } } { list dbg $dbg }
		% p
		dbg 0
		% p -debug
		dbg debug

		% proc p { { level -switch {{quiet 0} {verbose 9}} } { list level $level }
		% p -quiet
		level 0
		% p -verbose
		level 9

 * **-required BOOLEAN** defines that the value is required to be set.
   If set to true, the argument is required and any default  value is
   ignored. It is the default handling for non-named argument without a
   default value. If set to false, the argument is not required to be set
   and the related argument will be left unset if there is no default value.
   It is the default  handling for named argument.  

		% proc p { { v -required 0 }  } {
		    if {[info exist v]} {list v $v} {return "v is unset"}
		  }
		% p 5
		v 5
		% p
		v is unset

 * **-upvar LEVEL** defines, that the local argument will become an
   alias to the variable in the frame at level LEVEL corresponding to
   the parameter value. This is similar to what is achieved when using
   the _upvar_ command.
   This  specifier is incompatible with the _-switch_ specifier.

		% proc p { { v -upvar 1 } } { incr v }
		% set a 2
		2
		% p a
		3
		% set a
		3

Further argument specifiers may be added in future TIP. Examples of
new argument specifiers which may be added in the future:

 * type assertion \(**-assume TYPE**\)

 * argument documentation \(**-docstring DOC**\)

 * ...

## Named arguments

The following rules define how named arguments are expected to be specified
on the call-site:

 * Named arguments must always be specified using their name, they can't be
   specified as positional arguments.

		% proc p { {a -name A} } { list a $a }
		% p aa
		wrong # args: should be "p |-A a|"
		% p -A aa
		a aa

 * When several names \(using _-name_ or _-switch_ options\) are
   specified for the same argument, only one is required to be used on
   the call-site, unless a default value is also specified. If more than
   one is used, the latest value/switch is kept.

		% proc p { { v -name {v val} } } { list v $v }
		% p -v 6 -val 8
		v 8

 * Both _-name_ and _-switch_ specifiers can be used on the same
   argument.

		% proc p { { level -name level -switch {{quiet 0} {verbose 9}} } {
		    list level $level
		  }
		% p -level 4
		level 4
		% p -verbose
		level 9

 * A group of contiguous named arguments are handled together and are not
   required to be specified in the same order as defined.

		% proc p { {a -name A} {b -name B} } { list a $a b $b }
		% p -B bb -A aa
		% a aa b bb

 * The handling of a group of contiguous named arguments \(which can be
   only one argument\) is ended on the first argument which is either
   a parameter not starting with a dash or the special _--_ end-of-options
   marker. Remaining arguments will then be assigned to following positional
   arguments.

		% proc p { {o -name opt} args } { list o $o args $args }
		% p -opt O 5
		o O args 5
		% p -opt O -1 0
		wrong # args: should be "p |-opt o| ?arg ...?"
		% p -opt O -- -1 0
		o O args {-1 0}

 * If there is a fixed number of non-optional positional arguments and no
   special _args_ variable after the named group, the handling of a named
   group will also be ended when the remaining arguments to assign
   will be equal to the number of positional arguments after the group.

		% proc p { {o -name opt} posarg } { list o $o posarg $posarg }
		% p -opt O -1
		o O posarg -1

## Generated usage description

The error message, automatically generated when the input arguments are
invalid, is updated regarding new options:

 * Pass-by-name arguments \(specified using _-upvar level_ option\) are
   surrounded by the '&' character.

		% proc p { { v -upvar 1 } } { }
		% p
		wrong # args: should be "p &v&"

 * Named arguments are showed how they should be called and surounded
   by the '\|' character. If several names have been specified,
   they are grouped together.

		% proc p { { l -name level -switch {high low} -required 1} } {}
		% p
		wrong # args: should be "p |-level l|-high|-low|"

 * When an argument is optional, '?' is used.

		% proc p { { v -name var } a } {}
		% p
		wrong # args: should be "p ?|-var v|? a"

## Introspection

The _info argspec proc_ command is added to get the argument specification
of all arguments or of a specific argument.

	% proc p { a { b 1 } { c -name c } } {}
	% info argspec proc p
	a { b -default 1 } { c -name c }
	% info argspec proc p c
	-name c

Similar _info argspec_ subcommands are also added for lambda, object
method and object constructor.

The _info argspec specifiers_ command is added to get the specifiers 
supported by the current interpreter.

	% info argspec specifiers
	-default -name -required -switch -upvar

## Other use cases

Extended argument specifiers can also be used with other _proc_-like
functions. The following functions are supported and can use extended
argument specifiers:

 * anonymous functions \(lambda\), used with _apply_ command ;

 * TclOO constructor or methods.

## Performance

The proposed modification has no significant performance impact:

 * existing code \(and code not using extended argspec\) is not impacted
   by the change as the current initialisation code is still available
   and used ;

 * code using extended argspec _may_ be impacted because the
   initialisation code is different and is required to loop on each
   argument, but initial testing does not show a significant slowdown.

When using named arguments specifiers to replace a similar handling done
in Tcl-pure code, there is however a significant increase in performance.

See <https://gist.github.com/mlafon/70480877a28f3571e0377eabc0cee7be> 
for details on performance testing done on the proposed implementation.

# Implementation

This document proposes the following changes to the Tcl core:

 1. Add ExtendedArgSpec structure which is linked from CompiledLocal
    and contains information about extended argument specification;

 2. Add a flags field in the Proc structure to later identify a proc
    with at least one argument defined with an extended argument
    specification \(PROC\_HAS\_EXT\_ARG\_SPEC\);

 3. Update proc creation to handle the extended argument specification
    and fill the ExtendedArgSpec structure;

 4. Update InitArgsAndLocals to initialize the compiled locals using
    a dedicated function if the PROC\_HAS\_EXT\_ARG\_SPEC flag has been
    set on the proc. If unset, the original initialization code is
    still used.

 5. Update ProcWrongNumArgs to generate an appropriate error message
    when an argument has been defined using an extended argument
    specification;

 6. Add _info argspec_ command;

 7. Update documentation in doc/proc.n and doc/info.n;

 8. Update impacted tests and add dedicated tests in tests/proc-enh.test.

## Reference Implementation

The reference implementation is available in the tip-457
<https://core.tcl-lang.org/tcl/timeline?r=tip-457>  branch.

The code is licensed under the BSD license.

# Discussion

This section details some of the alternate solutions for this feature or
specific comments about it.

Initial approaches that tried to work with unmodified procedures are
not detailed here for clarity.

## Dedicated builtin command

A dedicated command can be used to handle the named arguments, using an
_-option value_ syntax, before calling the target procedures with all
arguments correctly prepared.

	% call -opts myproc -optC foo -optB {5 5} -- "some pos arg"

An implementation of this proposal is available at
<https://github.com/mlafon/tcl/tree/457-CALL-CMD> . This proposal was
abandoned as it was not enough intuitive for users.

## Modification in how proc are defined

Tcl-pure procedures can be defined in a way which state that the procedure
will automatically handle _-option value_ arguments.

	% proc -np myproc { varA { optB defB } { optC defC } { optD defD } args } { .. }
	% myproc -optC foo -optB {5 5} -- "some pos arg"

An other possibility is to support options on arguments and allow name
specification:

	% proc myproc { varA { optB -default defB -name B } args } { .. }
	% myproc a -B b zz

This is the currently proposed solution in this TIP. It requires the
procedures to be modified but allow additional features.

Some people have expressed concern about the modification of the _proc_
command, which is a core command of Tcl. A particular attention has been paid
to ensure that existing code will not be impacted and that future usage could
be later added by adding new specifiers.

## Argument Parsing command

Cyan Ogilvie's paper from Tcl2016
<https://www.tcl.tk/community/tcl2016/assets/talk33/parse_args-paper.pdf> 
describes a C extension to provide core-like argument parsing at speed
comparable to _proc_ argument handling, in a terse and self-documenting
way.

Alexandre Ferrieux has proposed
<http://code.activestate.com/lists/tcl-core/18447/>  to use the same
argument specifiers than this proposal, but with a dedicated command which
can be called from the proc body. This has the advantage to not alter the
_proc_ command and could be located in an extension.

Although the _proc_ usage will not be modified, this new command will
probably have to access or modify internal proc structures, for example
to support introspection.

Having to declare final local variables in the body, also seems confusing
for users.

## Preventing Data-dependent bugs

It has been proposed by Christian Gollwitzer
<http://code.activestate.com/lists/tcl-core/18457/>  to make the special '--'
end-of-options marker mandatory when the number of positional arguments after
the named group is not fixed. This would suppress any potential Data-Dependent
bugs related to the search of the initial dash and remove any unwanted object
stringification, at the expense of forcing the user to explicitely use
the end-of-option marker.

This proposal is currently not implemented but the documentation has been
modified to list the cases for which '--' should be use.

# Copyright

This document has been placed in the public domain.