TIP 623: Bless negative index arguments to Tcl_GetRange

Bounty program for improvements to Tcl and certain Tcl packages.
    Author:        Don Porter <[email protected]>
    State:         Draft
    Type:          Project
    Vote:          Pending
    Created:       24-Mar-2022
    Tcl-Version:   8.6.13
    Tcl-Branch:    tip-623


Proposes official approval and documentation of a meaning for negative index values passed to Tcl_GetRange which have long been used unofficially with compromised reliability.



The routine

Tcl_Obj* Tcl_GetRange(Tcl_Obj* objPtr, int first, int last)

first appeared in 23b23af342 and was first released in Tcl 8.2. From the beginning it was documented:

Tcl_GetRange returns a newly created object comprised of the characters between first and last (inclusive) in the object’s Unicode representation. If the object’s Unicode representation is invalid, the Unicode representation is regenerated from the object’s string representation.

This same documentation is in the Tcl 8.6.12 release unchanged other than the replacement of the words object by value. This reflects a legacy documentation practice widespread in Tcl where the functionality of a routine is not documented in explicit terms of valid argument ranges and corresponding results, but instead in terms of a somewhat vague description of how the routine might be implemented.

The documentation suggests that Tcl_GetRange might be understood as the equivalent of

Tcl_NewUnicodeObj(Tcl_GetUnicode(objPtr) + first, last - first + 1)

(In fact, it was never implemented precisely in that way, and the implementation has been revised multiple times over the years.) With this understanding, all non-empty substrings of the non-empty string value objPtr (with length of N characters, N > 0) are produced by the index ranges 0 <= first <= last < N. Any plausible implementation of Tcl_GetRange has to support those argument values. The callers of Tcl_GetRange within the source code of Tcl 8.2 itself stay within those argument limits. Support for other possible argument values raises uncertainty.

The header comment of the Tcl_GetRange routine says:

The first and last indices are assumed to be in the appropriate range.

However, the comment offers no detail on what that appropriate range is. There are no tests that directly exercise Tcl_GetRange from which the limits might be inferred. Furthermore, the original implementation made no checks of the values passed in for the first and last arguments. This is true even though passing in a value of last >= N to the original Tcl 8.2 implementation caused it to read past the logical end of the objPtr value, possibly into uninitialized or unallocated memory.

This is another legacy practice found in Tcl code with roots in an older programming culture. The expense of argument checking was often avoided. In its place was an assumption that a routine and its callers were trusted partners seeking a common goal of a working program. A caller would have no sensible reason to pass in a bad argument, so it was considered wasteful to expend effort confirming it did not. In the unfortunate case when a broken caller might pass a program-breaking argument value, "it could keep both pieces" of the broken program. The open source nature of Tcl offered assurance that the programmers of every caller had the ability to peek inside and know what they were doing. In this environment where arguments were not checked, no mechanisms were established for reporting errors that would never be detected. Tcl_GetRange has no documented failure mode. It is compelled to return some Tcl value, or abort the program.

In more recent times, more and more software joins together larger and larger collections of modules from multiple sources. The risks of introducing a malicious module have increased, and interest in robust, reliable operations has also grown. Libraries that can be induced to access memory outside proper allocations and initializations receive increasingly less acceptance. More powerful hardware has made the costs of argument checking less noticeable. We also have developed a greater appreciation for the freedom of implementation evolution provided by stronger encapsulation at interfaces. The Tcl sources have been gradually adapted over time to add more checks in more places, especially at interfaces with foreign modules.

Which returns us to the question of what values of first and last the Tcl_GetRange routine should accept. Since a caller must avoid passing a value last >= N, an implied burden on the caller is to know or retrieve the value of N. In particular, a caller seeking the suffix of objPtr, beginning with known valid index first and continuing to the end of the string, needs the value N to be able to pass (N - 1) as the last argument. The analogous Tcl command string range is easier to use since it allows a special value end as an index argument. A C programmer is inspired by that example to seek out an equivalent.

Within all of that context, a programmer might notice that the routine

Tcl_Obj* Tcl_NewUnicodeObj(Tcl_UniChar* unicode, int numChars)

has two documented modes of operation. When numChars >= 0, a new value of length numChars is created and returned, initializing from unicode. However, when numChars < 0, the new value initializes from unicode up to, but not including, the first null character. Combine this observation with the simple implementation of Tcl_GetRange suggested by its documentation, and one might craft the clever solution,

Tcl_GetRange(objPtr, first, -2)

as a call that can perform the suffix operaton in many circumstances.

When last is -2, the quantity (last - first + 1) is (-1 - first), and when it is known 0 <= first < N <= INT_MAX, it becomes clear that numChars is negative. For this scheme to work, it is necessary that Tcl_GetUnicode returns a pointer to a Tcl_UniChar array terminated by a null character. That is not documented, but it has been true. The scheme also only works if the value of objPtr does not contain a null character. This is a bug in the strategy, but one which might not be encountered in normal operations. The scheme also only works when Tcl_GetRange is implemented in the simple way suggested, or in a way that duplicates every result that simple approach would produce. There is an even rarer bug lurking in calls where objPtr has a mal-encoded string rep containing a NUL byte.

When last is -1 instead, the utility as a suffix operation is almost as good. The only failure mode added is when first is 0. A caller might test or otherwise rule out that case. The value of -1 is used to signal special operations or conditions in several places in Tcl, which might inspire this choice. For other increasingly negative values for last, the scheme grows increasingly fragile as it fails for shorter and shorter objPtr values due to overflow.

While this Easter egg in the Tcl_GetRange implementation supports this one useful calling mode, it also produces some absurd results. For example, when N >= 3, the call

Tcl_GetRange(objPtr, 2, 0)

produces the same result as the call

Tcl_GetRange(objPtr, 2, N - 1)

which is at least surprising. If this anomaly had been noticed first, I expect it would have been perceived as a bug and fixed.

The suffix scheme for calling Tcl_GetRange failed in the original release, because Tcl_NewUnicodeObj did not implement its documented second mode. This bug 218974 was fixed for release Tcl 8.2.3 .

Checkin 174db4e4cd added an optimization case when objPtr is a bytearray. This change first appeared in release Tcl 8.6.0. The optimized branch did not duplicate the prior results when (last - first + 1) is negative. This increased the circumstances where the suffix scheme fails, and introduced other EIAS violations.





This document has been placed in the public domain.