TIP 297: Integer Type Introspection and Conversion

Login
Author:         Don Porter <[email protected]>
State:          Withdrawn
Type:           Project
Vote:           Pending
Created:        20-Nov-2006
Obsoleted-By:	502,514,515
Post-History:
Tcl-Version:    8.7
Keywords:	Tcl, number, expression

Abstract

This TIP proposes changes to complete the set of commands to test and convert among Tcl's integer types.

Background

There are four integer types that appear in Tcl's C API. They are int, long, Tcl_WideInt, and mp_int. The corresponding routines to pull a value of each of those types from a Tcl_Obj are Tcl_GetIntFromObj, Tcl_GetLongFromObj, Tcl_GetWideIntFromObj, and Tcl_GetBignumFromObj. These integer types form increasing sets. That is, every Tcl_Obj that can return an int can also return a long, Tcl_WideInt, or mp_int.

Strictly speaking, the set of Tcl_Obj values that can successfully return either and int, long, or Tcl_WideInt is platform-dependent, because the size of these types is platform dependent. Tcl_GetIntFromObj accepts integer values in any format (decimal, binary, octal, hexadecimal, etc., see TCL_PARSE_INTEGER_ONLY in [249]) that are within the inclusive platform-dependent range (-UINT_MAX, UINT_MAX). Tcl_GetLongFromObj accepts integer values in any format that are within the inclusive platform-dependent range (-ULONG_MAX, ULONG_MAX). Tcl_GetWideIntFromObj accepts integer values in any format that are within the inclusive platform-dependent range (-ULLONG_MAX, ULLONG_MAX), or the appropriate equivalent for the platform. Tcl_GetBignumFromObj accepts integer values in any format with (effectively) no limit on range.

The most common example of platform dependence of results seen at the script level is the different results of [expr int(.)] on most 32-bit systems,

 % set tcl_platform(wordSize)
 4
 % expr int(1<<31)
 -2147483648

compared with LP64 systems.

 % set tcl_platform(wordSize)
 8
 % expr int(1<<31)
 2147483648

These differences show up most unfortunately when implementing algorithms designed to operate explicitly on 32-bit buffers, where the only portable way to do the operations in Tcl is with careful application of masking (& 0xffffffff). For one well-known example, see the sha1 package in tcllib. The additional operations in Tcl expressions harm performance.

There are other Tcl routines that pull values from Tcl_Obj that accept supersets of one of the integer types. An example is Tcl_GetIndexFromObj which will accept anything that Tcl_GetIntFromObj accepts, as well as other string values. There are also Tcl built-in commands that accept arguments that are supersets of one of the integer types. An example is uplevel which accepts as its level argument anything that Tcl_GetIntFromObj accepts, as well as other string values.

All Tcl commands are ultimately defined by the C command procedures that run to implement them, and when those command procedures use the routines mentioned above to pull values from command arguments, the result is that the Tcl commands will succeed or fail depending on whether or not an integer value of the right type has been provided by the caller. As a simple example:

 % lindex {} 0xffffffff
 % lindex {} 0x100000000
 bad index "0x100000000": must be integer or end?-integer?

In order to avoid errors from commands, a cautious programmer may wish to test whether a value is of an acceptable type before passing it to a command. The string is integer command has long offered this facility for commands that require (a superset of) an int.

 % string is integer 0xffffffff
 1
 % string is integer 0x100000000
 0

Most of Tcl's built-in commands that accept an integer valued argument require that argument to be acceptable to Tcl_GetIntFromObj and the existing string is integer command provides sufficient introspection.

[188] created the new command string is wideinteger, and that is suitable for testing values for the small number of Tcl commands that strictly require a value acceptable to Tcl_GetWideIntFromObj. Those commands are:

 after $wide
 binary format w $wide
 chan seek $chan $wide
 chan truncate $chan $wide
 clock add $wide
 clock format $wide

There are some built-in Tcl commands that require an argument that is acceptable to Tcl_GetBignumFromObj. That is, the argument must be an integer, but no range limit is imposed. Currently there is no test command appropriate for argument checking for these commands.

 dict incr $dictVar $bignumkey $bignum
 expr srand($bignum)
 expr ~$bignum
 expr $bignum % $bignum
 expr $bignum << $int
 expr $bignum >> $bignum
 expr $bignum & $bignum
 expr $bignum ^ $bignum
 expr $bignum | $bignum
 incr $bignumVar $bignum
 format $integerSpecifier $bignum

There are some built-in Tcl commands that require an argument that is acceptable to Tcl_GetLongFromObj. Currently there is no test command appropriate for argument checking for these commands.

 binary format i $long
 binary format s $long
 binary format c $long
 file atime $path $long
 file attributes $path -permissions $long
 file mtime $path $long

Note that the accepted ranges of the Tcl_GetFooFromObj routines can lead to surprising results. For example, Tcl_GetIntFromObj accepts values from -UINT_MAX to UINT_MAX. For some things this is good, since it supports

 binary format i 0x80000000

on 32-bit platforms, which is a common coding style. However the same range acceptance leads to surprising (and arguably incorrect, in the presence of bignum support) things like:

 % string repeat a -4294967290
 aaaaaa

It seems there are good uses for both strict and liberal routines for pulling integer ranges from a Tcl_Obj. Compatibility concerns would favor keeping the existing routines liberal, and adding strict counterparts. If this is pursued, however, another collection of string is test commands would be needed as well.

Proposed Changes

Still pondering how best to react to this background. Discussion invited on TCLCORE.

Compatibility

Reference Implementation

Copyright

This document has been placed in the public domain.