TIP 551: Permit underscore in numerical literals in source code

Bounty program for improvements to Tcl and certain Tcl packages.
Author:        Eric Taylor <[email protected]>
State:         Final
Type:          Project
Vote:          Done
Created:       16-Sep-2019
Tcl-Version:   8.7
Tcl-Branch:    tip-551
Keywords:      numbers, readability
Sponsor:       Brian Griffin <[email protected]>
Vote-Summary:  Accepted 6/0/2
Vote-For:      BG,KW,MC,FV,JN,SL
Vote-Against:  none
Vote-Present:  DP,KK


This TIP proposes that all numbers in scripts shall allow digit separators in the form of underscore characters for readability of code.

Rationale and Discussion

In most modern programming languages, it is possible to group the digits in the program's source code to make it easier to read; Ada, C# (from version 7.0), D, Haskell (from GHC version 8.6.1), Java, OCaml, Perl, Python (from version 3.6), Ruby, Rust, and Swift all allow use of a digit seperator character, specifically the underscore (_) character, for this purpose. All these languages allow nine hundred million to be entered as 900_000_000.

This TIP proposes to change TCL to include this ability.


To allow underscore in all numerical constants (decimal, octal, 0x... 0b... 0d... 0..., and real numbers) the underscore character would be simply an aid to visibility. This character would serve as a comment in the sense it would be allowed in the program source code but have no semantic affect. Any number of underscores would be allowed and their positions would be unrestricted except where a number would no longer be a number, i.e., as the first character.

For example,

expr 100_000_000
expr 0xffff_ffff
expr 0b1111_1111_1111_1110

A number is identified in the routine TclParseNumber and scans the text until the end of the number. It accepts several formats. The key is that a number is a word that begins with a decimal digit and can have several forms for decimal, octal, hexidecimal, binary, and real. The convention currently is to use a 2 character prefix, 0x, 0b, 0o, and the newest one, 0d. This proposal would allow an underscore following the first digit or the letter that designates the number base, to anywhere up to and including the end of the number. There can be multiple underscores in a row.



would be legal and similarly with the other 3 bases supported.

The one restriction would be a number with a leading _ as that would change the meaning from a number to a bareword, e.g.

expr _123

would not be allowed, since that would not be a number.

The implementation would be to simply bypass the _ in all cases during the processing of the number in the routine TclParseNumber.

A preliminary implementation was tested with 4 lines of code at the beginning of the function. It is at the main while loop in TclParseNumber and includes this,

while (1) {
  char c = len ? *p : '\0';

  // ------------- add this to allow _ in a number and just bypass it
  if ( c == '_' ) {
  // -------------- end of code to allow underscore


See branch tip-551


This document has been placed in the public domain.