2020-04-01
| ||
11:48 | fixes [f583715154] - tclUnixSock.c: introduced ThreadActionProc considering a transfer of async-conn... check-in: ef20726bcd user: dgp tags: core-8-6-branch | |
2020-03-31
| ||
20:04 | fixes [f583715154] - tclUnixSock.c: introduced ThreadActionProc considering a transfer of async-conn... check-in: 532506ef71 user: sebres tags: bug-f583715154-v2 | |
16:55 | • Ticket [f583715154] Fileevent runs in the wrong thread status still Open with 3 other changes artifact: 7c3ec75f9a user: sebres | |
2019-12-27
| ||
13:55 | • Ticket [815e246806] tcl crash on channel thread transfer and fileevent writable in place status still Pending with 3 other changes artifact: fe0ed4bbcc user: sebres | |
2019-08-22
| ||
10:02 | • Ticket [e484a8407c] pipe: time-related sporadic hangs in io/chan-io tests status still Open with 3 other changes artifact: 618aa576db user: sebres | |
2018-06-12
| ||
11:42 | • Ticket [815e246806] tcl crash on channel thread transfer and fileevent writable in place status still Pending with 3 other changes artifact: 51d427ec57 user: sebres | |
10:14 | • Ticket [815e246806]: 4 changes artifact: 97ab9bdffb user: sebres | |
09:52 | • Ticket [815e246806]: 4 changes artifact: 1891b2b42a user: oehhar | |
2018-06-11
| ||
11:42 | • Ticket [815e246806]: 3 changes artifact: ab8dfa8eec user: sebres | |
11:11 | fixed [815e246806] for 8.6th case if channel switches owner in fileevent via thread::transfer and th... check-in: 922ee82dde user: sebres tags: fix-815e246806-8-6 | |
2018-06-08
| ||
22:51 | • Ticket [815e246806] tcl crash on channel thread transfer and fileevent writable in place status still Pending with 3 other changes artifact: 7dde2fc044 user: sebres | |
20:35 | • Ticket [815e246806]: 3 changes artifact: 0197c57390 user: sebres | |
18:59 | Fixes bug [815e246806] using atomic reference counter (atomic incr/decr during channel transfer proc... check-in: bd0fca47e3 user: sebres tags: fix-815e246806-8-5 | |
17:17 | • Ticket [815e246806] tcl crash on channel thread transfer and fileevent writable in place status still Pending with 3 other changes artifact: d24f340efb user: sebres | |
16:13 | • Ticket [815e246806]: 3 changes artifact: ae97acc89e user: sebres | |
15:23 | • Ticket [815e246806]: 3 changes artifact: 0d2b5128d9 user: max | |
13:45 | • Pending ticket [815e246806]. artifact: cd466c081d user: sebres | |
13:43 | • Ticket [815e246806]: 3 changes artifact: 5e6858ce0c user: sebres | |
13:33 | merge fix for [815e246806] from 8.5th branch fix-815e246806-8-5 check-in: f44a1ddd60 user: sebres tags: fix-815e246806-8-6 | |
13:28 | fixes [815e246806], extend Tcl_RegisterChannel/Tcl_UnregisterChannel with preserving of ref-count of... check-in: dafa1a0859 user: sebres tags: fix-815e246806-8-5 | |
11:42 | • Ticket [815e246806] tcl crash on channel thread transfer and fileevent writable in place status still Open with 4 other changes artifact: fd46bf22bb user: sebres | |
11:41 | • Ticket [815e246806]: 3 changes artifact: 5536eecb2d user: sebres | |
09:01 | • Ticket [815e246806]: 3 changes artifact: 24f12cb2af user: sebres | |
09:01 | • Ticket [815e246806]: 3 changes artifact: 22dd9b895d user: max | |
08:32 | • Ticket [815e246806]: 4 changes artifact: 59d4644da6 user: oehhar | |
08:17 | • Ticket [815e246806]: 3 changes artifact: b5833e369f user: max | |
07:34 | • New ticket [815e246806]. artifact: 5cf3f5cc39 user: oehhar | |
Ticket UUID: | 815e2468064b4324bc4b303b9bafd3ef92803c32 | |||
Title: | tcl crash on channel thread transfer and fileevent writable in place | |||
Type: | Bug | Version: | >= 8.5 (partially), >= 8.6 | |
Submitter: | oehhar | Created on: | 2018-06-08 07:34:54 | |
Subsystem: | 49. Threading | Assigned To: | sebres | |
Priority: | 5 Medium | Severity: | Important | |
Status: | Pending | Last Modified: | 2019-12-27 13:55:50 | |
Resolution: | Fixed | Closed By: | nobody | |
Closed on: | ||||
Description: |
SummaryTransfering a channel to another thread by "thread::transfer" with a "fileevent writable" on the channel crashes tclsh after around 20 tries on Linux and Windows. The crash goes away, if the file event is cleared before the transfer. CrashTest script: package require Thread set handleproc { proc handle {chan} { global name other if {$name eq "master"} { puts $name } else { thread::send $other "puts slave" } thread::transfer $other $chan thread::send $other [list fileevent $chan writable [list handle $chan]] } } eval $handleproc set name master set other [thread::create] thread::send $other [list set name slave] thread::send $other [list set other [thread::id]] thread::send $other $handleproc set nulldev [dict get {unix /dev/null windows NUL} $tcl_platform(platform)] set chan [open $nulldev RDWR] handle $chan vwait forever Demonstration that it works without fileevent activeThe following script switches the event of before the transfer. It does not crash: package require Thread set handleproc { proc handle {chan} { global name other if {$name eq "master"} { puts $name } else { thread::send $other "puts slave" } thread::transfer $other $chan thread::send $other [subst { fileevent $chan writable { fileevent $chan writable {} after 0 handle $chan } }] } } eval $handleproc set name master set other [thread::create] thread::send $other [list set name slave] thread::send $other [list set other [thread::id]] thread::send $other $handleproc set nulldev [dict get {unix /dev/null windows NUL} $tcl_platform(platform)] set chan [open $nulldev RDWR] handle $chan vwait forever References
| |||
User Comments: |
sebres added on 2019-12-27 13:55:50:
It looks like this ticket in thread-repo [d481f15516a465b1] may be affected by this issue too. sebres added on 2018-06-12 11:42:30: I tried to test the fix without atomic operations (rewritten atomic-defines similar the last section from current branch if no TCL_THREADS declared, so as (++(p)) and (--(p))). It's not stable as non-atomic (sporadically produces panic "Channel released more than preserved"). Please find enclosed better (faster) test-case without hand-brake in the console (so it reproduces very fast a segfault or panic in original tcl without the fix, resp. without atomic operations also):
sebres added on 2018-06-12 10:14:20: Hi Harald, Thx for test! Still someone with *nix?:) And the question remains - what we do with atomic builtins? Does anyone have some special system/platform (older compiler) to test? oehhar added on 2018-06-12 09:52:43: Sergey, thank you for your continuous passionous work. The fix works totally for me. Test detailsHere is my windows test on Win10 2017-10 64 bit GER compiled with 32 bit:
The crash does not occure after 1 Minuite run time. With original 8.6.8, it occures after 20 iterations. TCL Test run:
compared to tcl 8.6.8 release (http://wiki.tcl.tk/37529), there are no test differences. Sergey, great work, thank you, Harald sebres added on 2018-06-11 11:42:08: Branch fix-815e246806-8-6 is ready now for testing/review. This fix could theoretacally work without the new handling round abount channel ref-count (also non atomic), but... I'm not sure it works then in all possible constellations. In addition I would like to save this code to be pretty sure if someone calls this public API across two threads a bit other way (as in thread-module). And just to have atomic operations in tcl for future requirements (I've used it already for windows since I fixed pipes-error [8bd13f07bd], see [ad65e576ee]). Just this handling (atomic primitives in tclInt.h) is not complete possibly and may be missing for some older gcc/clang versions resp. some special platforms, that are don't have support for builtin atomic (or just miss some header include). ATM tested MS-VC (>= 6.0), mingw (gcc >= 4.2), linux (gcc >= 6.2, clang >= 3.8). sebres added on 2018-06-08 22:51:59: The case if channel switches owner via "thread::transfer" fixed now in [2e7300065a]. Currently still as interim state, because thread::detach/attach still failed sporadically with "SIGBUS":
The reason is the case of synchronous sending of the attach inside of the running event-handler is more complex as provided solution.
But strange is trying to simulate thread::transfer, using "send -async" by attach, ends into SIGSEGV with almost the same call-stack inside TclChannelEventScriptInvoker. I doubt it may be something wrong with "detach", because it does definitelly the owner thread. WiP sebres added on 2018-06-08 20:35:44: Too complex to port my tcl-thread-module back (too many dependencies to resolve)... But just tried to protect the refCount of channel with atomic operations and it solves the crashes for windows (8.5/8.6), but on unix it refuses for some reasons for 8.6 (but work in 8.5). So tried to add assert within TclChannelEventScriptInvoker to check Tcl_GetCurrentThread() == Tcl_GetChannelThread(chanPtr), and it fails, as expected, with:
for
So indeed it hangs still somewhere in notifier/event-handler during the transfer-process (but thread-module wait for transmit-confirmation from other thread!).
Still worse is, that if I replace the transfer with attach/detach pair like below, it changes nothing at all:
This implies the "detach" does not work properly (I mean after detach TclChannelEventScriptInvoker will be called again inside the OLD thread). Strange is also, that it's absolutely not reproducible with my own thread-module (but still the same tcl-version)... sebres added on 2018-06-08 17:17:28: So now it going to the thread-module (tested with 2.7 up to 2.8.1 resp. current trunk). I had a fixed (with rewritten attach/detach/transfer) version installed where it does not occur. After change to tcl-core version it's pretty reproducible. Trying to port bug-fix... WiP. sebres added on 2018-06-08 16:13:53: Strange... I cannot reproduce it anymore. Just to be sure, which version of the tclthread-module do you use? Or better revision if self-compiled... max added on 2018-06-08 15:23:41: Unfortunately I see no change in behavior between 8.6.7 and fix-815e246806-8-6. I still see get the crashes with the "writable" script from Harald's initial description and also with the "readable + delayed" one from my latest post (after replacing [after idle] by [eval]). BTW, most invocations (pre and post your patch) result in a plain segfault without further output, but sometimes I get an error message + abort with differing error messages, e.g.: ---- alloc: invalid block: 0x5645d07a60b0: c0 d0 alloc: invalid block: 0x5645d07a60b0: c0 d0 ---- TclStackFree: incorrect freePtr (0x55807a654ac0 != 0x55807a654b00). Call out of sequence? ---- TclNRExecuteByteCode: abnormal return at pc 6: stack top -18 < entry stack top 0 TclNRExecuteByteCode execution failure: end stack top < start stack top ---- Reuse of ChannelBuffer! 0x55ffee568d80 Reuse of ChannelBuffer! 0x55ffee568d80 ---- Tcl_AppendLimitedToObj called with shared object ---- This looks to me like there is still some severe memory corruption going on. sebres added on 2018-06-08 13:43:30:
Two new branches "fix-815e246806-8-5" and "fix-815e246806-8-6" provide a fix for the issue, see [f44a1ddd60] for example. Further testers and reviewers are welcome. sebres added on 2018-06-08 11:41:35:
Yep, the reason for it is the pair TclChannelPreserve/TclChannelRelease in
the TclChannelEventScriptInvoker.
sebres added on 2018-06-08 09:01:56: Hmm strange, I can't reproduce it on my own tcl-versions at all (but pretty good reproducible with original core-tcl >= 8.5). I'll try to find what is different there as regards the fileevents. Could be possibly fixed via sebres-8-x-event-perf-branch's, that are merged on my own tcl's. I'll supply a fix if I find the reason. max added on 2018-06-08 09:01:32: Here is another variant that shows two things:
I post the variant that works. The crash happens when [after idle] in the handle proc is removed or replaced by [eval]. set handleproc { proc handle {chan} { global name other puts $name read $chan thread::transfer $other $chan after idle [subst { thread::send $other { fileevent $chan readable {handle $chan} } }] thread::send $other [list after 500 [list puts $chan x]] } } eval $handleproc set name master set other [thread::create] set chan [open |cat RDWR] fconfigure $chan -buffering none -blocking 0 thread::send $other [list set name slave] thread::send $other [list set other [thread::id]] thread::send $other $handleproc handle $chan vwait forever oehhar added on 2018-06-08 08:32:29: The error escape observed on windows is in line 1974 in generic/tclIO.c : void TclChannelRelease( Tcl_Channel chan) { Channel *chanPtr = (Channel *) chan; if (chanPtr->refCount == 0) { ---> Tcl_Panic("Channel released more than preserved"); } if (--chanPtr->refCount) { return; } if (chanPtr->typePtr == NULL) { ckfree(chanPtr); } } max added on 2018-06-08 08:17:01: The fileevent appears to get cleared automatically when transferring the channel, so there is no need to do that manually before the transfer. The critical point for the crash seems to be that a new fileevent is being set up in the receiving thread while the event handler in the sending thread is still running. So, if I just warp the last line of the handler proc in my original script in after idle [list ...] to defer setting up the new event trigger until the old handler has returned, the crash is gone. |