Tcl Source Code

View Ticket
EuroTcl/OpenACS 11 - 12 JULY 2024, VIENNA
Ticket UUID: e9a2715d912d86448a2ed18b2d25788cd50769b4
Title: Tcl 8.6.11: Incompatible Tcl_GetRange()
Type: Bug Version: 8.6.11
Submitter: petasis Created on: 2022-01-16 16:43:09
Subsystem: 44. UTF-8 Strings Assigned To: jan.nijtmans
Priority: 5 Medium Severity: Important
Status: Closed Last Modified: 2022-03-13 01:05:59
Resolution: Fixed Closed By: jan.nijtmans
    Closed on: 2022-03-13 01:05:59
Hi all,

I have found that there was a significant change in Tcl_GetRange() that happened in 8.6.11, and is not reported in the release notes.

If last parameter is -1, it does not work as it used to work, and now returns an empty string.

This is not expected, as -1 is an unofficial "end" shortcut, that used to work.

User Comments: jan.nijtmans added on 2022-03-13 01:05:59: (text/x-fossil-wiki)
@dgp, the defections you are mentioning are in past versions of Tcl_GetRange() right? The current implementation doesn't have those defects any more, right?

What is done now is roughly:
	if (last < 0 || last >= length) {
	    last = length - 1;
	if (last < first) {
	    return newObjPtr;

So, if 'last' is < 0, it will be immediately replated with length -1. If that's still negative (then length = 0, because 'first' is never negative), then last < first, so the empty string is produced: correct.

Also, after this, last >= first, so last - first + 1 >= 1. This means Tcl_NewByteArrayObj() will never be called with a negative length, neither will Tcl_New(String|Unicode)Obj().

My conclusion: The current code is consistant and correct, previous versions were not. We could add some more testcases for Tcl_GetRange(), to prove everything is correct, but I would keep the code as it is now. OK?

I'll close the "bug-e9a2715d91" branch, which contains the 8.6.10 code, which had the defects you mention. core-8-6-branch doesn't have those defects.


dgp added on 2022-03-12 23:33:48:
I went through the history, and found many things to make me unhappy.
Bad design made worse by bad documentation.  But I'll spare you a long
rant at every turn of the rabbit hole.

I made no note of this change in the 8.6.11 release notes because I
honestly had no idea this functionality existed, let alone someone
might be using it.  If it had been called to my attention I would have
considered it a bug.  I truly assumed the added argument checking was just
formalizing the rules any sensible caller was already following. As
the comment said "The first and last indices are assumed to be in the
appropriate range."

That said, it did exist, and it did get use.

I have to caution that the mental model that a value of -1 for the
"last" argument causes the return of a suffix substring starting at
index "first" and continuing all the way to the end of the string
is not reliably correct.  It fails in three ways.

1) Tcl_GetRange(objPtr, 0, -1) returns an empty string, because
then (last - first + 1) is 0, which is not negative, so the special
modes of Tcl_NewStringObj() and Tcl_NewUnicodeObj() are not engaged.

2) Since 8.6b1 (2009), Tcl_GetRange processes pure byte array
values differently.  A value of -1 for "last" can lead to a call to
Tcl_NewByteArrayObj(bytes, length) with length < 0, which is documented
as invalid.  In practice, Tcl_NewByteArrayObj() converts to length == 0 and
the result is an empty value.  This is an EIAS violation, where Tcl_GetRange()
produces different results on the same value differing only in its
internals.  The work on the branch tip tries to fix the EIAS violation by
noticing the negative value for "last" and fixing up the length argument,
which might work except for...

3) When the routines Tcl_New(String|Unicode)Obj() are passed negative
"length" or "numChars" arguments, they do not in fact change to a
"take the whole string" mode.  Precisely, they change to a
"take everything up to the first NUL" mode.  Both the internal Utf
and the Tcl_UniChar array representations can contain NUL bytes
interior to the string values.  We can cook up test cases for both
scenarios, and then those would be EIAS violations with the revised
bytearray processing.

I'm ambivalent how best to move forward.  Whatever we settle on we should
be sure to get documented far more clearly.

I will mention that NUL termination tends to get in the way of adopting
some alternative data structures.  Doing away with it might open the door
for allowing substrings and strings to share storage more often for example.
The Tcl_GetRange() behavior noted here is all wrapped up in that representation.
Even if we preserve this feature now, I might still seek to eliminate it
for Tcl 9 to open up more future implementation options.

Finally, I can mention that any code that has been calling

    Tcl_GetRange( objPtr, first, -1);

in Tcl releases 8.2.0 through 8.6.10 could be revised to call

    Tcl_NewUnicodeObj( Tcl_GetUnicode(objPtr) + first, -1 );

to get the same behavior in all those releases and also unchanged
in 8.6.11 and later.

Hope that helps.

jan.nijtmans added on 2022-01-21 16:21:02: (text/x-fossil-wiki)
I prepared a [5b0676e5ca583e75|branch] in which Tcl_GetRange() is brought back to how it was in Tcl 8.6.10. Might be useful for experimenting. If any trouble is found in the current Tcl_GetRange() function, this is what we could do (although I prefer to keep it as it is now). @dgp, thank you for having a look!

jan.nijtmans added on 2022-01-20 17:53:01: (text/x-fossil-wiki)
@dgp: Yes, please, double-check! Maybe we shouldn't change the behavior of Tcl_GetRange in 8.6, even though it's broken......

Docs didn't change in 8.6, just be clarified in 8.7 about what happens when last<0. In 8.6, last<0 is not supported, although extension might depend on the current behavior (they shouldn't)

jan.nijtmans added on 2022-01-20 17:53:00: (text/x-fossil-wiki)
@dgp: Yes, please, double-check! Maybe we shouldn't change the behavior of Tcl_GetRange in 8.6, even though it's broken......

Docs didn't change in 8.6, just be clarified in 8.7 about what happens when last<0. In 8.6, last<0 is not supported, although extension might depend on the current behavior (they shouldn't)

dgp added on 2022-01-20 16:49:25:
I haven't dived in to see the details yet, but the decscription here and seeing
docs changed between patch releases gives me the worries.  Maybe more comments
later when I see what's up.

jan.nijtmans added on 2022-01-19 10:53:48: (text/x-fossil-wiki)
Merged to core-8-6-branch (and up) now.

jan.nijtmans added on 2022-01-18 15:31:57: (text/x-fossil-wiki)
Thanks!  I certainly see your point. Indeed Tcl_NewStringObj() and Tcl_NewUnicodeObj() behave as you describe. But there is also a code-path within Tcl_GetRange() which uses Tcl_NewByteArrayObj(), which doesn't behave that way. You have been lucky that using -1 worked .....!

This also means that using -1 in the past never could be thrusted anyway, so it's up to us to define what should happen. I agree with you that handing it like an unofficial 'end' shortcut makes sense. I'll make the change.

petasis added on 2022-01-18 14:57:29: (text/x-fossil-wiki)
The change in behaviour is caused by this code that was added in 8.6.11:

if (last < first) {
    return Tcl_NewObj();

if last is -1 (or any number < 0), this shortcut will lead directly to an empty string.

But Tcl_New(whatever here)Obj(bytes + first, last - first + 1) will always not return an empty string (with the exception of first being 0, which is a rare use to ask for 0, -1). If last is -1, and first > 0, last - first + 1 will always be negative, so you get back everything till the end of the string.

petasis added on 2022-01-18 14:54:51: (text/x-fossil-wiki)
Dear Jan,

At the Tcl level we have "end". The unofficial "end" being -1, is for users of the C API. Tcl_NewStringObj, Tcl_SetStringObj, Tcl_AppendToObj, all accept -1 (in fact any negative number) as "till the end of the string".

Tcl_GetRange() documentation does not promise that (that the last parameter can be a negative number), but passing a negative number as the last used to "work as expected".

What I can assure you, is that as a developer of several C extensions, I have used "-1" as an unofficial "end" position, because I have seen the code of Tcl_GetRange() and came to a conclusion that it is safe to do so, as it was using Tcl_NewStringObj() and friends internally.

I do not suggest that this was correct, or that the documentation suggests so. I am raising the issue that like me there may be other C extensions that are using this "convenience", and that this change deserves at least a mention of incompatibility in the release notes of 8.6 & 8.7. What is written now (that error checking has been added) is not enough.

To my view, and according to the documentation, passing a negative value to last in Tcl_GetString() is an erroneous situation. It should never happen.

And input checking was added to the function. Which leads to a decision on what to do if "last" is negative. Given that a NULL cannot be returned (the documentation promises that a new object is always returned), the decision is to return an empty string.

I would have taken a different decision, to be set to the last character of the string, because a) the length of the string is already known inside the function and b) is less code for the caller to find the last character of the string.
But thats me :-) At least update the release notes to point out the potential incompatibility.

jan.nijtmans added on 2022-01-17 22:39:37: (text/x-fossil-wiki)
Tried the same with Tcl 8.6.0 too (with the same modificatiion). "string range abc 0 -1" there gives the empty string too, no indication that -1 works as unofficial "end" shortcut.

jan.nijtmans added on 2022-01-17 22:37:57: (text/x-fossil-wiki)
OK, a small experiment. Take version 8.6.10, and modify it a little but such that "string range" just calls Tcl_GetRange() without any modification: [756287e16e039f7d]
% package require Tcl
% string range abc 0 5
abcow        <--   indeed protection of 'last' doesn't work any more
% string range abc 0 -1
%            <--   empty string

So, where do I see that '-1' is an official "end" shortcut?  I don't see that happening in Tcl 8.6.10. I which version did it work like you describe? Apparently not in Tcl 8.6.10!

jan.nijtmans added on 2022-01-17 17:37:05: (text/x-fossil-wiki)
Possible fix [ed61de701e3367e5|here]. However, I'm not convinced yet this is what we should be doing. There are multiple places in Tcl core code, where it's assumed that -1 means 'end before start'. Changing the behavior of Tcl_GetRange(), all such callers have to be modified.

What happened is [767e070d35|this]: In the past, Tcl_GetRange() didn't check it's inputs, which could easily lead to crash. Therefore, extra checks were added, but I don't remember changing any behaviour like you describe. It was certainly not intended to do that.

So, this needs a little bit more investigation what's going on. Any-one else who has an idea/opinion, please add your comments here. Thanks!