Ticket UUID: | 7371b6270b727521cf0c7044a882d16ef5c4e96b | |||
Title: | AddressSanitizer use-after-return detection breaks NRE tests, coroutines | |||
Type: | Patch | Version: | 8.6.13 | |
Submitter: | chrstphrchvz | Created on: | 2023-08-04 04:36:43 | |
Subsystem: | 60. NRE and coroutines | Assigned To: | jan.nijtmans | |
Priority: | 5 Medium | Severity: | Important | |
Status: | Closed | Last Modified: | 2025-05-20 18:39:37 | |
Resolution: | Accepted | Closed By: | dgp | |
Closed on: | 2025-05-20 18:39:37 | |||
Description: |
AddressSanitizer (ASan) can detect use-after-return errors. Depending on the platform, this is enabled either by default, or by using the appropriate compiler flags or runtime environment variables (e.g. ASAN_OPTIONS=detect_use_after_return=1). But enabling this can completely break Tcl coroutines: there will be unexpected “cannot yield: C stack busy” errors, and numerous tests will fail or even hang (e.g. http-3.10.0, io-28.6). This is because the real stack gets replaced with a heap-allocated fake stack, and there is no guarantee that the address of a stack variable will be consistent for a given C stack depth, as assumed by the coroutines implementation. Rather than requiring AddressSanitizer use-after-return detection to be disabled, I suggest that Tcl use AddressSanitizer-friendly approaches instead. For GCC and Clang, there is __builtin_frame_address(0); and for MSVC, there is _AddressOfReturnAddress(). And I suggest always using these when available, for consistency between builds with and without AddressSanitizer, and in case of other use-after-return detection tools with similar limitations. See attached patch. https://github.com/llvm/llvm-project/blob/llvmorg-10.0.0-rc1/clang/lib/Basic/Stack.cpp#L24 shows a very similar example. | |||
User Comments: |
dgp added on 2025-05-20 18:39:37:
It looks like the tcl/unix builds make it far too easy for a `make test` to test the installed library that is still buggy instead of testing the built library in the workspace which includes the fix. I think this was a false alarm. chrstphrchvz added on 2025-05-16 21:20:21: I do not know if I will return to help with this anytime soon. clang-1600.0.26.6 is Xcode/command line tools 16.2, released late 2024--much more recent than what I had access to when testing this patch. Maybe try adding #error to see which approach is getting compiled instead. dgp added on 2025-05-16 18:10:30: This patch was merged to the 8.6 branch in checkin [910b623484]. When I attempt a build of that checkin on a Macbook Pro, M1 model from 2021, running Sonoma 14.7.5, and the "gcc" compiler is clang-1600.0.26.6, I see crashes in the test suite in test files coroutine.test nre.test This is with a build under the tcl/unix build system. I have to guess that the compiler intrinsics added by this patch need some platform variation to fix the problem. The crashes seem to persist in all checkins of the 86 branch since then. chrstphrchvz added on 2023-10-22 16:24:40: I am not aware of anything else to do for this ticket; closing. Thanks for committing and revising this, Jan! chrstphrchvz added on 2023-10-11 10:59:44: …in other words, I consider the change to be justified. chrstphrchvz added on 2023-10-11 10:47:52: Note to self regarding #if __GNUC__ being changed to #if defined(__GNUC__) in [df7a3fd9e118]: although the former is what the LLVM example used, and would seem more robust in case of someone/something else defining __GNUC__ to 0, the latter is more consistent with the rest of core Tcl. chrstphrchvz added on 2023-10-05 08:14:33: I notice the CI failure for GCC on Windows. It attempted to use _AddressOfReturnAddress() which is MSVC-specific, but the condition is HAVE_INTRIN_H even though intrin.h is not MSVC-specific. I am thinking that the condition for _AddressOfReturnAddress() should depend on defined(_MSC_VER) (whether or not in addition to HAVE_INTRIN_H); and/or the GCC/Clang __builtin_frame_address(0) case should go first (as done by LLVM). chrstphrchvz added on 2023-09-30 08:37:43: See revised attached patch. It seems that factoring out code for use in both tclBasic.c and tclTest.c requires reserving a stub in tclInt.decls. chrstphrchvz added on 2023-08-04 15:30:24: I now notice the same issue affecting NRE test functions. I will look into patching those as well. |
Attachments:
- 7371b6270b.diff [download] added by chrstphrchvz on 2023-09-30 08:36:43. [details]