TIP 697: 32-bit truncation in format and scan

Login
EuroTcl/OpenACS 11 - 12 JULY 2024, VIENNA
Author:		Jan Nijtmans <[email protected]>
State:		Final
Type:		Project
Vote:		Done
Created:	2024-05-31
Tcl-Version:	9.0
Vote-Summary:   Accepted 6/0/0
Votes-For:      AK, AN, HO, JN, KW, SL
Votes-Against:  None
Votes-Present:  None

Abstract

This TIP proposes to change the format and the scan command, making them more compatible with sprintf/sscanf in C. For the Tcl_ObjPrintf() function, this has no effect.

The idea is, to make format compatible with sprintf, but with the following 3 remarks:

  1. The ll size specifier (and its alias L) always mean: no truncation. This will be kept as it is now, despite being not sprintf-compatible.

  2. The l size specifier always means 64-bit truncation, which is compatible with sprintf on lp64 systems. No change here either.

  3. When no size specifier is present, 32-bit truncation is done. This is what sprintf does, and what format does in Tcl 8.6 on ilp32 and llp64 systems. This will be the new behavior on all systems, starting with Tcl 9.0.

The new behavior of format/scan will the same on whatever platform, it won't depend on ::env(wordSize) any more.

While on it, the %q, %z, %t and %j size modifiers will be added to scan as well. They are defined the same as for the format command.

Rationale

Let's start with some examples:

$ tclsh8.6 (32-bit)
% format %x -1
ffffffff
% scan ffffffffffffffff %x a
1
% set a
2147483647
%
...
$ tclsh8.6 (64-bit)
% format %x -1
ffffffffffffffff
% scan ffffffffffffffff %x a
1
% set a
-1
%
...
$ tclsh86 (Windows, 32-bit)
% format %x -1
ffffffff
% scan ffffffffffffffff %x a
1
% set a
2147483647
%
...
$ tclsh86 (Windows, 64-bit)
% format %x -1
ffffffff
% scan ffffffffffffffff %x a
1
% set a
2147483647
%

We can see that format and scan give different answers on different platforms. This behavior is documented:

    If neither of those are present, the integer value is
    truncated to the range determined by the value of the
    wordSize element of the tcl_platform array).
    ...
    Either one indicates the integer range to be stored is limited to the range
    determined by the value of the wordSize element of the tcl_platform
    array)
This TIP will change that: All examples above will behave the same on all platforms.

The L size modifier currently means something different in scan and format. In format it's an alias for ll, in scan it's an alias for l. This TIP chooses to change the scan command, so L won't do any limiting any more.

Specification

Change the format and scan commands such that - if no size modifier is given in the format string, the value will be truncated/limited to a 32-bit range. This is the same as currently on 32-bit platforms and on Windows (both 32-bit and 64-bit). Only on 64-bit UNIX/MacOS, the behavior will change.

This is also the same as the sprintf/scanf C functions do, which format/scan try to mimic.

Also, the L size modifier for the scan command is modified to be an alias for ll, not l. No limiting will be done any more when using L.

Those changes mean that format and scan will behave the same on all platforms, 32-bit or 64-bit, Windows, UNIX or MacOS (except for the %z and %t specifiers, but that's on purpose).

While having a look at the implementation anyway, we now also implement the q/z/t/j size specifiers for scan in the same way as already done for format.

Remark: The behavior of %p, %a/%A and %I/%I32/%I64 is considered out of scope for this TIP. It can be discussed independently.

Implementation

Implementation is in TCL branch "tip-697".

Compatibility

The L size modifier in the scan command will no longer limit the value to a 64-bit range. If your code depends on that, use l in stead. Since l already has this behavior in Tcl 8.6/8.7, this will make your code behave the same in 8.x and 9.0.

Except for the L change in scan, those changes have no impact on 32-bit platforms and on Windows (both 32-bit and 64-bit). If you want format/scan to truncate/limit to a 64-bit value, use the l size modifier. If you don't want them to truncate/limit, use the ll size modifier.

Copyright

This document has been placed in the public domain.