Author:        Don Porter <[email protected]>
    State:         Final
    Type:          Project
    Vote:          Done
    Created:       24-Mar-2022
    Tcl-Version:   8.6.13
    Tcl-Branch:    tip-623
    Vote-Summary:  Accepted 3/0/0
    Votes-For:     DGP, JN, SL
    Votes-Against: none
    Votes-Present: none

Abstract

Proposes official approval and documentation of a meaning for negative index values passed to Tcl_GetRange which have long been used unofficially with compromised reliability.

Background

The routine

Tcl_Obj* Tcl_GetRange(Tcl_Obj* objPtr, int first, int last)

first appeared in 23b23af342 and was first released in Tcl 8.2. From the beginning it was documented:

Tcl_GetRange returns a newly created object comprised of the characters between first and last (inclusive) in the object’s Unicode representation. If the object’s Unicode representation is invalid, the Unicode representation is regenerated from the object’s string representation.

This same documentation is in the Tcl 8.6.12 release unchanged other than the replacement of the words object by value. This reflects a legacy documentation practice widespread in Tcl where the functionality of a routine is not documented in explicit terms of valid argument ranges and corresponding results, but instead in terms of a somewhat vague description of how the routine might be implemented.

The documentation suggests that Tcl_GetRange might be understood as the equivalent of

Tcl_NewUnicodeObj(Tcl_GetUnicode(objPtr) + first, last - first + 1)

(In fact, it was never implemented precisely in that way, and the implementation has been revised multiple times over the years.) With this understanding, all non-empty substrings of the non-empty string value objPtr (with length of N characters, N > 0) are produced by the index ranges 0 <= first <= last < N. Any plausible implementation of Tcl_GetRange has to support those argument values. The callers of Tcl_GetRange within the source code of Tcl 8.2 itself stay within those argument limits. Support for other possible argument values raises uncertainty.

The header comment of the Tcl_GetRange routine says:

The first and last indices are assumed to be in the appropriate range.

However, the comment offers no detail on what that appropriate range is. There are no tests that directly exercise Tcl_GetRange from which the limits might be inferred. Furthermore, the original implementation made no checks of the values passed in for the first and last arguments. This is true even though passing in a value of last >= N to the original Tcl 8.2 implementation caused it to read past the logical end of the objPtr value, possibly into uninitialized or unallocated memory.

This is another legacy practice found in Tcl code with roots in an older programming culture. The expense of argument checking was often avoided. In its place was an assumption that a routine and its callers were trusted partners seeking a common goal of a working program. A caller would have no sensible reason to pass in a bad argument, so it was considered wasteful to expend effort confirming it did not. In the unfortunate case when a broken caller might pass a program-breaking argument value, "it could keep both pieces" of the broken program. The open source nature of Tcl offered assurance that the programmers of every caller had the ability to peek inside and know what they were doing. In this environment where arguments were not checked, no mechanisms were established for reporting errors that would never be detected. Tcl_GetRange has no documented failure mode. It is compelled to return some Tcl value, or abort the program.

In more recent times, more and more software joins together larger and larger collections of modules from multiple origins. The risks of introducing a malicious module have increased, and interest in robust, reliable operations has also grown. Libraries that can be induced to access memory outside proper allocations and initializations receive increasingly less acceptance. More powerful hardware has made the costs of argument checking less noticeable. We also have developed a greater appreciation for the freedom of implementation evolution provided by stronger encapsulation at interfaces. The Tcl sources have been gradually adapted over time to add more checks in more places, especially at interfaces with foreign modules.

Which returns us to the question of what values of first and last the Tcl_GetRange routine should accept. Since a caller must avoid passing a value last >= N, an implied burden on the caller is to know or retrieve the value of N. In particular, a caller seeking the suffix of objPtr, beginning with known valid index first and continuing to the end of the string, needs the value N to be able to pass (N - 1) as the last argument. The analogous Tcl command string range is easier to use since it allows a special value end as an index argument. A C programmer is inspired by that example to seek out an equivalent.

Within all of that context, a programmer might notice that the routine

Tcl_Obj* Tcl_NewUnicodeObj(Tcl_UniChar* unicode, int numChars)

has two documented modes of operation. When numChars >= 0, a new value of length numChars is created and returned, initializing from unicode. However, when numChars < 0, the new value initializes from unicode up to, but not including, the first null character. Combine this observation with the simple implementation of Tcl_GetRange hypothesized above and suggested by its documentation, and one might craft the clever solution,

Tcl_GetRange(objPtr, first, -2)

as a call that can perform the suffix operation in many circumstances.

When last is -2, the quantity (last - first + 1) is (-1 - first), and when it is known 0 <= first < N <= INT_MAX, it becomes clear that numChars is negative. For this scheme to work, it is necessary that Tcl_GetUnicode returns a pointer to a Tcl_UniChar array terminated by a null character. That is not documented, but it has been true. The scheme also only works if the value of objPtr does not contain a null character. This is a bug in the strategy, but one which might not be encountered in normal operations. The scheme also only works when Tcl_GetRange is implemented as hypothesized, or in a way that duplicates every result that hypotehetical approach would produce. There is an even rarer bug lurking in calls where objPtr has a mal-encoded string rep containing a NUL byte.

When last is -1 instead, the utility as a suffix operation is almost as good. The only failure mode added is when first is 0. A caller might test or otherwise rule out that case, since the suffix beginning at index 0 is trivially known to be objPtr itself. The value of -1 is used to signal special operations or conditions in several places in Tcl, which might also inspire this choice. For other increasingly negative values for last, the scheme grows increasingly fragile as it fails for shorter and shorter objPtr values due to overflow.

While this strategy of gaming a presumed Tcl_GetRange implementation supports this one useful calling mode, it also produces some absurd results. For example, when N >= 3, the call

Tcl_GetRange(objPtr, 2, 0)

produces the same result as the call

Tcl_GetRange(objPtr, 2, N - 1)

which is at least surprising. If this anomaly had been noticed first, I expect it would have been perceived as a bug and fixed.

The suffix scheme for calling Tcl_GetRange failed in the original release, because Tcl_NewUnicodeObj did not implement its documented second mode. This bug 218974 was fixed for release Tcl 8.2.3 .

Checkin 174db4e4cd added an optimization case to Tcl_GetRange when objPtr is a bytearray. This change first appeared in release Tcl 8.6.0. The optimized branch did not duplicate the prior results when (last - first + 1) is negative (because Tcl_NewByteArrayObj does not support a compatible special mode for a negative length value). This increased the circumstances where the suffix scheme fails, and where the results of Tcl_GetRange violate the EIAS principle.

During the development of TIP [389], the consistency and safety concerns with out of range index values were noticed, and checks were added to the Tcl_GetRange routine c22b158bd6 and those checks later merged into the 8.7 branch e109760b1c.
Ticket 767e070d35 raised the same concerns about Tcl 8.6, and the index value checks were backported and then released in Tcl 8.6.11. All these changes were made from the point of view of improving safety and clarity. There was no awareness of the semi-utility of the suffix scheme.

Earlier this year ticket e9a2715d91 raised the alarm about the "significant change" to Tcl_GetRange in Tcl 8.6.11. The comments on that ticket are a record of discovery of the events summarized above.

Proposal

Revise Tcl_GetRange to have defined and documented behavior when negative values are passed as the first or last arguments.

When first is negative, Tcl_GetRange behaves as if first == 0.

When last is negative, and the length of objPtr is N characters, Tcl_GetRange behaves as if last == N - 1.

In the Tcl 9 interface, these arguments have type ptrdiff_t.

Compatibility

This implementation is incompatible with Tcl releases 8.6.11 and 8.6.12. It restores a degree of compatibility with Tcl releases up to 8.6.10.

The compatibility point only concerns negative values of the last argument. Many programmers have considered those values to be out of scope, so the change will not have impact on them. Those programmers who did make use of this range of argument values will now benefit from having this functionality documented and supported.

Addendum

After TIP #660 was accepted, a lot of functions changed from using size_t to ptrdiff_t parameters. In order to prevent confusion, this change has been adapted in the TIP text above as well.

Implementation

This change is already in place on the Tcl 8.6 development branch and is poised to be released in Tcl 8.6.13. To the extent the documentation and implementation do not fully and clearly conform to this TIP, they will be updated prior to release of Tcl 8.6.13.

Copyright

This document has been placed in the public domain.