TIP 551: Permit underscore in numerical literals in source code

Login
Author:        Eric Taylor <[email protected]>
State:         Final
Type:          Project
Vote:          Done
Created:       16-Sep-2019
Post-History:
Tcl-Version:   8.7
Tcl-Branch:    tip-551
Keywords:      numbers, readability
Sponsor:       Brian Griffin <[email protected]>
Vote-Summary:  Accepted 6/0/2
Vote-For:      BG,KW,MC,FV,JN,SL
Vote-Against:  none
Vote-Present:  DP,KK

Abstract

This TIP proposes that all numbers in scripts shall allow digit separators in the form of underscore characters for readability of code.

Rationale and Discussion

In most modern programming languages, it is possible to group the digits in the program's source code to make it easier to read; Ada, C# (from version 7.0), D, Haskell (from GHC version 8.6.1), Java, OCaml, Perl, Python (from version 3.6), Ruby, Rust, and Swift all allow use of a digit seperator character, specifically the underscore (_) character, for this purpose. All these languages allow nine hundred million to be entered as 900_000_000.

This TIP proposes to change TCL to include this ability.

Specification

To allow underscore in all numerical constants (decimal, octal, 0x... 0b... 0d... 0..., and real numbers) the underscore character would be simply an aid to visibility. This character would serve as a comment in the sense it would be allowed in the program source code but have no semantic affect. Any number of underscores would be allowed and their positions would be unrestricted except where a number would no longer be a number, i.e., as the first character.

For example,

expr 100_000_000
expr 0xffff_ffff
expr 0b1111_1111_1111_1110

A number is identified in the routine TclParseNumber and scans the text until the end of the number. It accepts several formats. The key is that a number is a word that begins with a decimal digit and can have several forms for decimal, octal, hexidecimal, binary, and real. The convention currently is to use a 2 character prefix, 0x, 0b, 0o, and the newest one, 0d. This proposal would allow an underscore following the first digit or the letter that designates the number base, to anywhere up to and including the end of the number. There can be multiple underscores in a row.

Thus,

0x_ffff_ffff_

would be legal and similarly with the other 3 bases supported.

The one restriction would be a number with a leading _ as that would change the meaning from a number to a bareword, e.g.

expr _123

would not be allowed, since that would not be a number.

The implementation would be to simply bypass the _ in all cases during the processing of the number in the routine TclParseNumber.

A preliminary implementation was tested with 4 lines of code at the beginning of the function. It is at the main while loop in TclParseNumber and includes this,

while (1) {
  char c = len ? *p : '\0';

  // ------------- add this to allow _ in a number and just bypass it
  if ( c == '_' ) {
    p++;
    continue;
  }
  // -------------- end of code to allow underscore
  ...
}

Implementation

See branch tip-551

Copyright

This document has been placed in the public domain.