Tcl Source Code

View Ticket
Login
Ticket UUID: edb4b065f49b9e51304ccdba4ee4e126ba379b0d
Title: Crash in string compare of empty string against byte array
Type: Bug Version: >= 8.7
Submitter: apnadkarni Created on: 2024-03-27 15:37:36
Subsystem: - New Builtin Commands Assigned To: jan.nijtmans
Priority: 5 Medium Severity: Critical
Status: Closed Last Modified: 2024-04-04 14:45:25
Resolution: Fixed Closed By: jan.nijtmans
    Closed on: 2024-04-04 14:45:25
Description:

The following crashes Tcl

% string compare -length 1 "" [binary decode hex 00]
...boom...

Investigating...

User Comments: jan.nijtmans added on 2024-04-04 14:45:25:

I think you are totally mistaken. Writing "\xFC" in a script is not decoded using the system-encoding, it's always the +U00FC Unicode code-point. On the other hand, using ü in a script will give a different result depending on the system-encoding. That's why using utf-8 always prevents many unexpected problems.


sebres added on 2024-04-04 14:29:58:

the Ascification (as is done everywhere else in Tcl 8.6 tests) doesn't change the heart of the testcase.

But it does. If you don't want to consider UTF vs. UNICODE difference (why? the tests may cover different pieces in code and code of 8.6 is different to 8.7/9.0), it should be revisited at least for windows platform with non-cp1252 system encodings, just to illustrate:

% encoding convertfrom cp1252 \xFC
ü
% encoding convertfrom cp1251 \xFC
ь

jan.nijtmans added on 2024-04-04 12:28:23:

Hi Sergey.

I'm holding my objection. In Tcl 8.7 and 9.0, all test-cases are already ran in utf-8 mode, so the benefit of your change is 0.0000%

The only benefit would be in Tcl 8.6, and it only involves 7 test-cases. That way too little to impose this burden on 8.7 and 9.0 IMHO. Those 7 test-cases involve "string equal" which is fed with utf-8 strings anyway: the Ascification (as is done everywhere else in Tcl 8.6 tests) doesn't change the heart of the testcase.

Sorry!


sebres added on 2024-04-04 12:11:40:
@Jan, ping?

sebres added on 2024-03-28 22:12:27:

Huh?... The cost of tcltest::fileEncoding will be approximately 50µs per file. What by the 200 files amounts to "burdening" 10 milliseconds.

As for the "asciifying" in [03cf62003a88a5f2] - it is not quite kosher.

Firstly, ab\u7266 is not the same as ab牦 (first is unicode string, another is object with utf-8 bytes), what may change the test, internally. Still worse it may look by replacement for ü - \xFC, which is single-byte, and may work for windows with cp1252 as system encoding (and unix with utf-8 by fallback from ascii through cp1252), but may fail for platforms with another single-byte system encodings, not to mention the backport to Tcls where it behaves similar to 8.5 (where \xFC may be bytearray).

Then maybe better something like [encoding convertfrom cp1252 \xFC].

However, if one see the costs... is the effort, difference to 8.7/9.0 (and a removal of, imho, nice feature) justified?


jan.nijtmans added on 2024-03-28 21:27:46:

Thanks, @sebres, for backporting. It's highly appreciated!

However, I don't think it's a good idea burdening all other test-cases, determining the encoding, just for a single "string.test" that needs it. Therefore, I reverted this part, sorry! Asciifying the testcase makes it portable for any Tcl version, that's how it should be done IMHO.

Anyway, thanks!


sebres added on 2024-03-27 20:46:20:

Fixed for all branches, 8.6 has now the same tests (excepting few compat tweaks).

Thus close.


apnadkarni added on 2024-03-27 17:51:58:
Fixed for 9.0 but leaving open since bug has been marked for 8.7 if anyone wants to pursue that.

apnadkarni added on 2024-03-27 17:49:56:
@sebres,

I realized that but given limited time and resources, I do not plan on spending any effort

- working on 8.6 except for actual bugs that are present there

- working on 8.7 unless there is a concrete plan or need to officially release it

Given serious lack of programmers working on Tcl, every bit of work, even the half hour it might take merging / running tests on two platforms before commits, etc. is time better spent on 9.0 or extensions in my view. That's the path I'm taking.

Others may differ.

apnadkarni added on 2024-03-27 17:42:54:

Fixed in [5d52c6d730].


sebres added on 2024-03-27 17:24:56:

The affected versions are basically >= 8.7.
And test-cases can be added (cherry-picked) for 8.6 too.


apnadkarni added on 2024-03-27 16:17:33:
Proposed fix in [f5f42a6211].

apnadkarni added on 2024-03-27 16:16:48:
Crash occurs if either -length and -nocase is specified but not otherwise.