Tcl Source Code

View Ticket
Login
2023-05-03
10:40 Closed ticket [f5eadcbf9a]: passing pointer to uninitialized memory leads Tcl_UniCharToUtf() to corrupt data plus 4 other changes artifact: e3649dc9a2 user: pooryorick
2023-04-27
20:43 Ticket [f5eadcbf9a]: 4 changes artifact: a3361a4d4a user: pooryorick
20:42
Fix for [f5eadcbf9a6b1b4c], passing pointer to uninitialized memory leads Tcl_UniCharToUtf() to corr... check-in: 480c856482 user: pooryorick tags: trunk, main
2023-04-26
13:00 Ticket [f5eadcbf9a] passing pointer to uninitialized memory leads Tcl_UniCharToUtf() to corrupt data status still Pending with 3 other changes artifact: c61c0041b7 user: pooryorick
12:25 Ticket [f5eadcbf9a]: 3 changes artifact: 389fbe0bc9 user: jan.nijtmans
03:28 Ticket [f5eadcbf9a]: 3 changes artifact: fdff598bc8 user: pooryorick
02:44 Ticket [f5eadcbf9a]: 3 changes artifact: 981f63f81f user: pooryorick
02:36 Ticket [f5eadcbf9a]: 3 changes artifact: e3104d5d36 user: pooryorick
2023-04-25
20:58 Ticket [f5eadcbf9a]: 4 changes artifact: bee6417cc4 user: jan.nijtmans
20:50 Ticket [f5eadcbf9a]: 3 changes artifact: 799f8dfc20 user: jan.nijtmans
20:39 Ticket [f5eadcbf9a]: 4 changes artifact: 3fd844f6af user: pooryorick
20:36 Pending ticket [f5eadcbf9a]. artifact: e4991f1bd6 user: pooryorick
20:34
Fix for issue [f5eadcbf9a], passing pointer to uninitialized memory leads Tcl_UniCharToUtf() to corr... check-in: db9f715fd5 user: pooryorick tags: bug-f5eadcbf9a6b1b4c
20:26 New ticket [f5eadcbf9a] passing pointer to uninitialized memory leads Tcl_UniCharToUtf() to corrupt data. artifact: 92ef8facf8 user: pooryorick

Ticket UUID: f5eadcbf9a6b1b4cff5aaf0da5aa5706430dfb7c
Title: passing pointer to uninitialized memory leads Tcl_UniCharToUtf() to corrupt data
Type: Bug Version:
Submitter: pooryorick Created on: 2023-04-25 20:26:02
Subsystem: - New Builtin Commands Assigned To: pooryorick
Priority: 5 Medium Severity: Minor
Status: Closed Last Modified: 2023-05-03 10:40:33
Resolution: Fixed Closed By: pooryorick
    Closed on: 2023-05-03 10:40:33
Description:

In various locations in the source code uninitialized data is passed to Tcl_UniCharToUtf() via its *buf argument. In addition to holding the result of Tcl_UniCharToUtf(), since [e2278643dc330026] in 2015 *buf also serves to maintain the state of an ongoing surrogate character transformation, and random data in the uninitialized storage can cause Tcl_UniCharToUtf() to corrupt data.

User Comments: pooryorick added on 2023-04-27 20:43:36:

Rather than adjusting those tests I've modified the memset() calls to fill 0xff instead of 0. Fix now applied to trunk [480c856482].


pooryorick added on 2023-04-26 13:00:31:

Jan, first you get defensive and call this a non-issue, then you spam the ticket with test results from a work-in-progress branch in an attempt to distrct from the real issue: Random data in uninitialized memory are causing the "format-2.19" test to fail under a build of the pyk-Tcl_AppendObjToObj branch configured with "TCL_UTF_MAX = 3"?

Did you even notice that those failing tests hard-coded a dependence on uninitialized data, and just need to be adjusted for the fact that the memory area is now initialized?

I'm not in favor of the memset() solution. I'm in favor of deleting a whole bunch of "TCL_UTF_MAX = 3" code.


jan.nijtmans added on 2023-04-26 12:25:21:

This doesn't give much confidence that you tested your stuff: https://github.com/tcltk/tcl/actions/runs/4806142861/jobs/8553302123

utfext.test

==== Tcl_ExternalToUtf-ascii-414243-start-end Tcl_ExternalToUtf - ascii - 414243 - start end FAILED ==== Contents of test case: testencoding Tcl_ExternalToUtf ascii ABC {start end} {} 40 ---- Result was: ok {} ABC\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xFF ---- Result should have been (exact matching): ok {} ABC\x00\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF ==== Tcl_ExternalToUtf-ascii-414243-start-end FAILED ... Tests ended at Wed Apr 26 07:50:29 UTC 2023 all.tcl: Total 56303 Passed 50591 Skipped 5704 Failed 8


pooryorick added on 2023-04-26 03:28:24:

I started to enclose the calls to memset in a "TCL_UTF_MAX < 4" block as requested by Jan, but then decided against it. This is bad practice that makes the code more brittle for future developers. The cost of memset should be born until the code can be improved.


pooryorick added on 2023-04-26 02:44:59:

The Tcl code base has suffered a growing complexity problem in recent years, This report is just one symptom of the larger issue. Practices like the use of novel techniques to overload function signatures, the proliferation of macros that create different code bases during preprocessing, and other contortions in the name of preserving backwards compatibility should be curtailed.


pooryorick added on 2023-04-26 02:36:54:

This is not a non-issue. Commit [a61fef8429dfff4b] was recently moved off trunk due to the fact that it caused test format-2.19 to fail under a build of TCL_UTF_MAX = 3. This bug is the cause of that failure. In every spot except one "TCL_UTF_MAX < 4" would not be incorrect. I considered adding those, but I think Tcl_UniCharToUtf() is an example of the type of coding that there should be zero tolerance for, as smuggling flags and state into existing argument turns the code into a booby trap for future developers. See The Nature of Complexity by Jon Ousterhout. Instead, the signature of Tcl_UniCharToUtf() should be changed to explicitly support the new functionality. Also, TCL_UTF_MAX = 3 should be removed from trunk and from Tcl 9. It no longer serves any purpose, and is responsible for at least this bug. For whatever reason, valgrind is not picking up on this issue.

Jan, you won't be able to create a test case for this issue, yet it is real. After this is fixed and my commit mentioned above is reapplied to trunk, format-2.19 will be one canary for the issue.


jan.nijtmans added on 2023-04-25 20:58:50:

Example of a much better bug-report, with testcase: [95e287b956].


jan.nijtmans added on 2023-04-25 20:50:50:

This is a non-issue. In Tcl 9.0 with TCL_UTF_MAX=4, Tcl_UniCharToUtf() doesn't combine surrogates any more. You could put #ifdef's "TCL_UTF_MAX < 4" around it, otherwise this patch only costs performance without any benefit. Please don't do that (asking nicely)


pooryorick added on 2023-04-25 20:39:18:

A better fix would be to isolate the stateful data so that it isn't necessary to initialize the entire buffer, and an even better fix would be to use a proper structure to hold both the buffer and other pertinent information.


pooryorick added on 2023-04-25 20:36:50:

Fixed in [f5eadcbf9a6b1b4c].