Author: Eric Taylor <[email protected]>
State: Final
Type: Project
Vote: Done
Created: 16-Sep-2019
Post-History:
Tcl-Version: 8.7
Tcl-Branch: tip-551
Keywords: numbers, readability
Sponsor: Brian Griffin <[email protected]>
Vote-Summary: Accepted 6/0/2
Vote-For: BG,KW,MC,FV,JN,SL
Vote-Against: none
Vote-Present: DP,KK
Abstract
This TIP proposes that all numbers in scripts shall allow digit separators in the form of underscore characters for readability of code.
Rationale and Discussion
In most modern programming languages, it is possible to group the
digits in the program's source code to make it easier to read; Ada, C#
(from version 7.0), D, Haskell (from GHC version 8.6.1), Java, OCaml,
Perl, Python (from version 3.6), Ruby, Rust, and Swift all allow use
of a digit seperator character, specifically the underscore (_
)
character, for this purpose. All these languages allow nine hundred
million to be entered as 900_000_000
.
This TIP proposes to change TCL to include this ability.
Specification
To allow underscore in all numerical constants (decimal, octal, 0x... 0b... 0d... 0..., and real numbers) the underscore character would be simply an aid to visibility. This character would serve as a comment in the sense it would be allowed in the program source code but have no semantic affect. Any number of underscores would be allowed and their positions would be unrestricted except where a number would no longer be a number, i.e., as the first character.
For example,
expr 100_000_000
expr 0xffff_ffff
expr 0b1111_1111_1111_1110
A number is identified in the routine TclParseNumber
and scans the
text until the end of the number. It accepts several formats. The key
is that a number is a word that begins with a decimal digit and can
have several forms for decimal, octal, hexidecimal, binary, and real. The
convention currently is to use a 2 character prefix, 0x, 0b, 0o, and the
newest one, 0d. This proposal would allow an underscore following the
first digit or the letter that designates the number base, to anywhere
up to and including the end of the number. There can be multiple
underscores in a row.
Thus,
0x_ffff_ffff_
would be legal and similarly with the other 3 bases supported.
The one restriction would be a number with a leading _
as that would
change the meaning from a number to a bareword, e.g.
expr _123
would not be allowed, since that would not be a number.
The implementation would be to simply bypass the _
in all cases during
the processing of the number in the routine TclParseNumber.
A preliminary implementation was tested with 4 lines of code at the beginning of the function. It is at the main while loop in TclParseNumber and includes this,
while (1) {
char c = len ? *p : '\0';
// ------------- add this to allow _ in a number and just bypass it
if ( c == '_' ) {
p++;
continue;
}
// -------------- end of code to allow underscore
...
}
Implementation
See branch tip-551
Copyright
This document has been placed in the public domain.