TIP 551: Permit underscore in numerical literals in source code

Login
Bounty program for improvements to Tcl and certain Tcl packages.
Tcl 2019 Conference, Houston/TX, US, Nov 4-8
Send your abstracts to [email protected]
or submit via the online form by Sep 9.
Author:        Eric Taylor <[email protected]>
State:         Draft
Type:          Project
Vote:          Pending
Created:       16-Sep-2019
Post-History:
Tcl-Version:   8.7
Keywords:      numbers, readability
Sponsor:       Brian Griffin <[email protected]>

Abstract

This TIP proposes that all numbers in scripts shall allow underscores for readability of code.

Rationale and Discussion

In most modern programming languages, it is possible to group the digits in the program's source code to make it easier to read; Ada, C# (from version 7.0), D, Haskell (from GHC version 8.6.1), Java, OCaml, Perl, Python (from version 3.6), Ruby, Rust, and Swift all use the underscore (_) character for this purpose. All these languages allow nine hundred million to be entered as 900_000_000.

This TIP proposes to change TCL to include this ability.

Specification

To allow underscore in numerical constants (decimal, octal, 0x... 0b... 0d... and 0...) the underscore character would be simply an aid to visibility. This character would serve as a comment in the sense it would be allowed in the program source code but have no semantic affect. Any number of underscores would be allowed and their positions would be unrestricted except where a number would no longer be a number.

For example,

expr 100_000_000
expr 0xffff_ffff
expr 0b1111_1111_1111_1110

A number is identified in the routine TclParseNumber and scans the text until the end of the number. It accepts several formats. The key is that a number is a word that begins with a decimal digit and can have several forms for decimal, octal, hexidecimal, and binary. The convention is now to use a 2 character prefix, 0x, 0b, 0o, and the newest one, 0d. This proposal would allow an underscore following the first digit or the letter that designates the number base, to anywhere up to and including the end of the number. There can be multiple underscores in a row.

Thus,

0x_ffff_ffff_

would be legal and similarly with the other 3 bases supported.

The one restriction would be a number with a leading _ as that would change the meaning from a number to a bareword, e.g.

expr _123

would not be allowed, since that would not be a number.

The implementation would be to simply bypass the _ in all cases during the processing of the number in the routine TclParseNumber.

The implementation was tested with 4 lines of code at the beginning of the function. It is at the main while loop in TclParseNumber and includes this,

while (1) {
  char c = len ? *p : '\0';

  // ------------- add this to allow _ in a number and just bypass it
  if ( c == '_' ) {
    p++;
    continue;
  }
  // -------------- end of code to allow underscore
  ...
}

Copyright

This document has been placed in the public domain.