Tcl package Thread source code: View Ticket

Ticket UUID:	3532972
Title:	Tcl Scripts results in "called Tcl_FindHashEntry on deleted"
Type:	Bug	Version:	8.6.3
Submitter:	arius-marius	Created on:	2012-06-07 20:51:20
Subsystem:	80. Thread Package	Assigned To:	dgp
Priority:	5 Medium	Severity:	Minor
Status:	Closed	Last Modified:	2016-05-18 18:38:41
Resolution:	None	Closed By:	gahr
		Closed on:	2016-05-18 18:38:41
Description:	Tcl Version: 8.5.11 OS Platform and Version: Windows 7 (64 bit) & Linux (32 bit) Thread Package: 2.6.6 (Linux) & 2.6.7 (Windows) Problem Behavior: - If script execution has finished the following output appears: called Tcl_FindHashEntry on deleted - Additionally, under Linux a core dump is created Expected Behavior: - A clean finalization of Tcl interpreter Tcl-Script: package require Thread set th1 [thread::create {puts worker_running ; thread::wait ; puts worker_stopped}] thread::send -async $th1 {puts script_started; after 5000; puts script_done} go puts "waiting for go..." vwait go if { [thread::exists $th1] } { thread::release $th1 puts worker_released } puts main_done
User Comments:	gahr added on 2016-05-18 18:38:41: Committed in [1489249bd4]. Thank you! adrianmedranocalvo (claiming to be Adrián Medraño Calvo <[email protected]>) added on 2016-05-18 17:21:53: I would suggest wrapping all changes in: #ifdef SV_FINALIZE ... #endif as the thread counter is only decreased in SvFinalize, which is itself wrapped. Other than that, it looks fine. I have attached your patch with these changes, in case it helps. Best regards, Adrián. gahr added on 2016-05-18 15:48:35: I 100% agree with you. Thanks for bringing me back on track :) I have attached a revised version of the patch (3532972fff-2.diff.txt). I kindly ask for a final ack from your part before I commit this. Thanks! adrianmedranocalvo (claiming to be Adrián Medraño Calvo <[email protected]>) added on 2016-05-18 12:01:41: In my opinion different aspects should be handled separately. This issue was about a SEGFAULT under rare circunstances (threads, errors, exit, tsv access). As far as I can see the changes discussed in previous messages prevent the SEGFAULT. If thread exit handlers do not run reliably a new ticket should be opened with some details. Using the following script with printfs in SvFinalize, it seems to run every time: ~~~ package require Thread set tid [thread::create -joinable]; ::thread::release $tid; ::thread::join $tid; ::exit ~~~ With regards to enabling or disabling thread finalization altogether, I would suggest contacting the authors the reasons for the comment and whether it has been considered before enabling the feature. See [fd09ee731a3707e2d89414e1b04483abcbbb3291] and [072cffeecf3af65d6d60149c3e0fca67d6292c83], where it is disabled; and later [606750884097194eb1de16fc7d59ad8975105974], where it is enabled. It would seem to me that the comment is obsolete. Best regards, Adrián. gahr added on 2016-05-17 11:06:46: You are right, the cleanup routine must definitely be installed in the thread's exit handler, not the process'. However, I can't get the routine to be reliably called at every thread's exit, which is basically equivalent to not calling it at all. At this point, I'm more towards disabling it altogether. Thoughts? adrianmedranocalvo (claiming to be Adrián Medraño Calvo <[email protected]>) added on 2016-05-17 09:23:42: The approach is definitely cleaner, thank you. It seems to me that exit handlers created with Tcl_CreateExitHandler are only ever invoked once, upon Tcl_Finalize (or Tcl_Exit). Shouldn't we instead register with Tcl_CreateThreadExitHandler? Not that it affects this particular issue, but otherwise the following script would not invoke the finalization: ~~~ (run with tclsh) package require Thread thread::release [thread::create]; while {[incr i] < 10000} {}; # needed for some reason ~~~ I suggest the following changes on top of yours: ~~~ diff -u generic/threadSvCmd.c generic/threadSvCmd.c --- generic/threadSvCmd.c +++ generic/threadSvCmd.c @@ -21,6 +21,8 @@ #include "psGdbm.h" /* The gdbm persistent store implementation / #include "psLmdb.h" / The lmdb persistent store implementation / +#define SV_FINALIZE + / * Number of buckets to spread shared arrays into. Each bucket is * associated with one mutex so locking a bucket locks all arrays @@ -59,8 +61,10 @@ * Global variables used within this file. / +#ifdef SV_FINALIZE static size_t nofThreads; / Number of initialized threads / static Tcl_Mutex nofThreadsMutex; / Protects the nofThreads variable / +#endif static Bucket buckets; /* Array of buckets. / static Tcl_Mutex bucketsMutex; / Protects the array of buckets / @@ -116,7 +120,6 @@ static void SvAllocateContainers(Bucket); static void SvRegisterStdCommands(void); -#define SV_FINALIZE #ifdef SV_FINALIZE static void SvFinalizeContainers(Bucket); static void SvFinalize(ClientData); @@ -2133,12 +2136,14 @@ const Tcl_UniChar no[3] = {'n', 'o', 0} ; Tcl_Obj obj; +#ifdef SV_FINALIZE /* * Increment number of threads / Tcl_MutexLock(&nofThreadsMutex); ++nofThreads; Tcl_MutexUnlock(&nofThreadsMutex); +#endif / * Add keyed-list datatype @@ -2199,9 +2204,6 @@ Tcl_MutexLock(&bucketsMutex); if (buckets == NULL) { buckets = (Bucket )ckalloc(sizeof(Bucket) NUMBUCKETS); -#ifdef SV_FINALIZE - Tcl_CreateExitHandler(SvFinalize, NULL); -#endif for (i = 0; i < NUMBUCKETS; ++i) { bucketPtr = &buckets[i]; @@ -2234,6 +2236,10 @@ Tcl_MutexUnlock(&bucketsMutex); } +#ifdef SV_FINALIZE + Tcl_CreateThreadExitHandler(SvFinalize, NULL); +#endif + return TCL_OK; } ~~~ What do you think? --- Relatedly, there is this disconcerting comment in that file, just above SvFinalize: ~~~ #ifdef SV_FINALIZE /* * Left for reference, but unused since multithreaded finalization is * unsolvable in the general case. Brave souls can revive this by * installing a late exit handler on Thread's behalf, bringing the * function back onto the Tcl_Finalize (but not Tcl_Exit) path. / ~~~ ... which contradicts the "#define SV_FINALIZE" being unconditionally defined at the top of the file. Should it actually be disabled!? Best regards, Adrián. gahr added on 2016-05-13 16:12:11: I added a patch that solves the problem using a different approach, that is, keeping count of how many threads have registered and avoiding removing shared data until the last one unregisters. This seems to solve this specific problem (I can't reproduce the segfault) and frees us from having to put checks for buckets == NULL at all entry points. gahr added on 2016-05-13 15:21:23: Thank you, I was able to replicate running your script through this driver: while {1} { set delay [expr {int(rand() 10)}] catch {exec [info nameofexecutable] test3.tcl $delay} err } and I was able to get a core dump. I'll look into it! adrianmedranocalvo (claiming to be Adrián Medraño Calvo <[email protected]>) added on 2016-05-13 14:39:32: Hmm, the bug is certainly slippy, but I am able to reproduce it with Tcl 8.6.5 and Thread 2.7.3. Please, store the following version of the previous testcase in some file (the only change w.r.t. previous version is that it accepts a delay as command line argument): ~~~~~ bug.tcl package require Thread; set delay [lindex $::argv 0]; set t1 [::thread::create ::thread::wait] set t2 [::thread::create ::thread::wait] ::thread::send $t1 [list ::after $delay {::tsv::set sv k "somevalue"}]; ::thread::send $t2 [list ::nonexistingcommand]; ~~~~~~ Then run the following (quite convoluted) pipeline that invokes the script with a random delay. The `parallel' utility seems to help reproducing it, perhaps due to increased system load. catchsegv makes sure we get some output mentioning the SEGFAULT. ~~~ tr -dc '0-9' </dev/urandom \| fold -w1 \| parallel catchsegv tclsh bug.tcl 2>&1 \| grep Segment ~~~ You should start seeing some lines such as the following: ~~~ *** Segmentation fault ~~ Thank you for having a look. Best regards, Adrián. gahr added on 2016-05-13 12:44:14: Hi, I can't reproduce with Thread 2.7.3 running with Tcl 8.6.5 and Tcl 8.5.19. Can you try to upgrade? anonymous (claiming to be Adrián Medraño Calvo <[email protected]>) added on 2016-03-01 08:13:48: Hello, any thoughts on this? The patch is quite simple, but wraps most of the function in a conditional causing many white-space changes. Follows a version of the patch without white-space changes for easier review: --- i/src/tcl8.6.3/pkgs/thread2.7.1/generic/threadSvCmd.c +++ w/src/tcl8.6.3/pkgs/thread2.7.1/generic/threadSvCmd.c @@ -700,3 +700,3 @@ Bucket bucketPtr; - Array arrayPtr; + Array arrayPtr = NULL; @@ -706,2 +706,3 @@ + if (buckets != NULL) { p = array; @@ -725,3 +726,5 @@ Tcl_HashEntry hPtr = Tcl_FindHashEntry(&bucketPtr->arrays, array); - if (hPtr == NULL) { + if (hPtr != NULL) { + arrayPtr = (Array)Tcl_GetHashValue(hPtr); + } else { UNLOCK_BUCKET(bucketPtr); @@ -731,5 +734,9 @@ } - return NULL; } - arrayPtr = (Array)Tcl_GetHashValue(hPtr); + } + } else { + if (!(flags & FLAGS_NOERRMSG)) { + Tcl_AppendResult(interp, "\"", array, + "\" has been finalized", NULL); + } } @@ -2186,3 +2193,5 @@ buckets = (Bucket )ckalloc(sizeof(Bucket) NUMBUCKETS); +#if defined(SV_FINALIZE) Tcl_CreateExitHandler(SvFinalize, NULL); +#endif /* SV_FINALIZE / anonymous (claiming to be Adrián Medraño Calvo <[email protected]>) added on 2015-06-29 09:02:46: The patch below modifies LockArray to ensure that `buckets` has not been cleaned up, returning an error if it has. Additionally, the exit handler is added conditional on `SV_FINALIZE` being defined, for consistency. diff --git i/src/tcl8.6.3/pkgs/thread2.7.1/generic/threadSvCmd.c w/src/tcl8.6.3/pkgs/thread2.7.1/generic/threadSvCmd.c index b8d90c0..1bccca7 100644 --- i/src/tcl8.6.3/pkgs/thread2.7.1/generic/threadSvCmd.c +++ w/src/tcl8.6.3/pkgs/thread2.7.1/generic/threadSvCmd.c @@ -698,40 +698,47 @@ LockArray( register unsigned int result; register int i; Bucket bucketPtr; - Array arrayPtr; + Array arrayPtr = NULL; /* * Compute a hash to map an array to a bucket. / - p = array; - result = 0; - while (p++) { - i = p; - result += (result << 3) + i; - } - i = result % NUMBUCKETS; - bucketPtr = &buckets[i]; - - / - * Lock the bucket and find the array, or create a new one. - * The bucket will be left locked on success. - / - - LOCK_BUCKET(bucketPtr); / Note: no matching unlock below ! / - if (flags & FLAGS_CREATEARRAY) { - arrayPtr = CreateArray(bucketPtr, array); - } else { - Tcl_HashEntry hPtr = Tcl_FindHashEntry(&bucketPtr->arrays, array); - if (hPtr == NULL) { - UNLOCK_BUCKET(bucketPtr); - if (!(flags & FLAGS_NOERRMSG)) { - Tcl_AppendResult(interp, "\"", array, - "\" is not a thread shared array", NULL); - } - return NULL; + if (buckets != NULL) { + p = array; + result = 0; + while (p++) { + i = p; + result += (result << 3) + i; + } + i = result % NUMBUCKETS; + bucketPtr = &buckets[i]; + + /* + * Lock the bucket and find the array, or create a new one. + * The bucket will be left locked on success. + / + + LOCK_BUCKET(bucketPtr); / Note: no matching unlock below ! / + if (flags & FLAGS_CREATEARRAY) { + arrayPtr = CreateArray(bucketPtr, array); + } else { + Tcl_HashEntry hPtr = Tcl_FindHashEntry(&bucketPtr->arrays, array); + if (hPtr != NULL) { + arrayPtr = (Array)Tcl_GetHashValue(hPtr); + } else { + UNLOCK_BUCKET(bucketPtr); + if (!(flags & FLAGS_NOERRMSG)) { + Tcl_AppendResult(interp, "\"", array, + "\" is not a thread shared array", NULL); + } + } + } + } else { + if (!(flags & FLAGS_NOERRMSG)) { + Tcl_AppendResult(interp, "\"", array, + "\" has been finalized", NULL); } - arrayPtr = (Array)Tcl_GetHashValue(hPtr); } return arrayPtr; @@ -2184,7 +2191,9 @@ Sv_Init (interp) Tcl_MutexLock(&bucketsMutex); if (buckets == NULL) { buckets = (Bucket )ckalloc(sizeof(Bucket) NUMBUCKETS); +#if defined(SV_FINALIZE) Tcl_CreateExitHandler(SvFinalize, NULL); +#endif /* SV_FINALIZE / for (i = 0; i < NUMBUCKETS; ++i) { bucketPtr = &buckets[i]; anonymous (claiming to be Adrián Medraño Calvo <[email protected]>) added on 2015-06-22 08:19:18: Thank you for your response. I must say that I don't see the relation to the pointed out issue, but admittedly I don't know enough about the internals. I've been investigating a little bit. The issue seems to be caused by Sv_Finalize being invoked prior to setting the tsv::array's value. That is, when `Sv_GetContainer` is invoked the `static Bucket buckets` is already NULL, and no checks are performed on LockArray. This seems to be the source of the SEGFAULT. Something worth pointing out is that SV_FINALIZE is unconditionally defined in the module, despite some comments warning against: ~~~ #define SV_FINALIZE #ifdef SV_FINALIZE static void SvFinalizeContainers(Bucket); static void SvFinalize(ClientData); #endif / SV_FINALIZE / [...] #ifdef SV_FINALIZE / * Left for reference, but unused since multithreaded finalization is * unsolvable in the general case. Brave souls can revive this by * installing a late exit handler on Thread's behalf, bringing the * function back onto the Tcl_Finalize (but not Tcl_Exit) path. */ ~~~ [...] static void SvFinalize (ClientData clientData) { [...] #endif Additionally, want to mention that I've been able to reproduce using Tclsh as well: one needs to prepend 'package require Tk` to the script. This hints at some bad interaction between Tk and thread finalization... Again, thank you for your help. Best regards, Adrián. jan.nijtmans added on 2015-06-19 10:43:10: Looks like Dup of [3494470], which is not a Thread bug, but a Tcl bug. anonymous (claiming to be Adrián Medraño Calvo <[email protected]>) added on 2015-06-19 09:59:00: This seems to still occur on 8.6.3. The following script: ~~~ package require Thread; set delay 5; set t1 [::thread::create ::thread::wait] set t2 [::thread::create ::thread::wait] ::thread::send $t1 [list ::after $delay {::tsv::set sv k "somevalue"}]; ::thread::send $t2 [list ::nonexistingcommand]; ~~~ SEGFAULTs fairly consistently (for what it seems to be a race-condition) when running it under Wish (had no luck trying to reproduce it under Tclsh). You might need to tune the delay. Below is a backtrace: ~~~ #0 0x00007fdb2ef061b9 in Sp_RecursiveMutexLock (muxPtr=0x600) at /prod/g463/src/tcl8.6.3/pkgs/thread2.7.1/generic/threadSpCmd.c:1529 #1 0x00007fdb2ef00cce in LockArray (interp=0x7fdb28000cd0, array=0x7fdb28037830 "sv", flags=0) at /prod/g463/src/tcl8.6.3/pkgs/thread2.7.1/generic/threadSvCmd.c:721 #2 0x00007fdb2ef002f7 in Sv_GetContainer (interp=0x7fdb28000cd0, objc=4, objv=0x7fdb280051a0, retObj=0x7fdb2eef7b58, offset=0x7fdb2eef7b48, flags=0) at /prod/g463/src/tcl8.6.3/pkgs/thread2.7.1/generic/threadSvCmd.c:329 #3 0x00007fdb2ef02bde in SvSetObjCmd (arg=0x0, interp=0x7fdb28000cd0, objc=4, objv=0x7fdb280051a0) at /prod/g463/src/tcl8.6.3/pkgs/thread2.7.1/generic/threadSvCmd.c:1669 #4 0x00007fdb3204fce7 in TclNRRunCallbacks (interp=interp@entry=0x7fdb28000cd0, result=0, rootPtr=0x7fdb280224b0) at /prod/g463/src/tcl8.6.3/generic/tclBasic.c:4392 #5 0x00007fdb3205269c in TclEvalObjEx (interp=interp@entry=0x7fdb28000cd0, objPtr=<optimized out>, flags=flags@entry=131072, invoker=invoker@entry=0x0, word=word@entry=0) at /prod/g463/src/tcl8.6.3/generic/tclBasic.c:5958 #6 0x00007fdb320526aa in Tcl_EvalObjEx (interp=interp@entry=0x7fdb28000cd0, objPtr=<optimized out>, flags=flags@entry=131072) at /prod/g463/src/tcl8.6.3/generic/tclBasic.c:5939 #7 0x00007fdb321317e0 in AfterProc (clientData=0x7fdb28039b10) at /prod/g463/src/tcl8.6.3/generic/tclTimer.c:1191 #8 0x00007fdb32131add in TimerHandlerEventProc (evPtr=evPtr@entry=0x7fdb2802eb30, flags=flags@entry=-3) at /prod/g463/src/tcl8.6.3/generic/tclTimer.c:593 #9 0x00007fdb32112cc7 in Tcl_ServiceEvent (flags=flags@entry=-3) at /prod/g463/src/tcl8.6.3/generic/tclNotify.c:670 #10 0x00007fdb32112fc9 in Tcl_DoOneEvent (flags=-3) at /prod/g463/src/tcl8.6.3/generic/tclNotify.c:967 #11 0x00007fdb2eefe267 in ThreadWait (interp=0x7fdb28000cd0) at /prod/g463/src/tcl8.6.3/pkgs/thread2.7.1/generic/threadCmd.c:2853 #12 0x00007fdb2eefb558 in ThreadWaitObjCmd (dummy=0x0, interp=0x7fdb28000cd0, objc=1, objv=0x7fdb28004fc0) at /prod/g463/src/tcl8.6.3/pkgs/thread2.7.1/generic/threadCmd.c:1142 #13 0x00007fdb3204fce7 in TclNRRunCallbacks (interp=interp@entry=0x7fdb28000cd0, result=0, rootPtr=0x0) at /prod/g463/src/tcl8.6.3/generic/tclBasic.c:4392 #14 0x00007fdb3204fd7f in Tcl_EvalObjv (interp=interp@entry=0x7fdb28000cd0, objc=objc@entry=1, objv=objv@entry=0x7fdb28004fc0, flags=flags@entry=2097168) at /prod/g463/src/tcl8.6.3/generic/tclBasic.c:4123 #15 0x00007fdb32051e5f in TclEvalEx (interp=0x7fdb28000cd0, script=0x7fdb28037c70 "::thread::wait", numBytes=<optimized out>, flags=<optimized out>, line=line@entry=1, clNextOuter=clNextOuter@entry=0x0, outerScript=0x7fdb28037c70 "::thread::wait") at /prod/g463/src/tcl8.6.3/generic/tclBasic.c:5261 #16 0x00007fdb32052176 in Tcl_EvalEx (interp=<optimized out>, script=<optimized out>, numBytes=<optimized out>, flags=<optimized out>) at /prod/g463/src/tcl8.6.3/generic/tclBasic.c:4926 #17 0x00007fdb2eefc87e in NewThread (clientData=0x7fffd35d3d40) at /prod/g463/src/tcl8.6.3/pkgs/thread2.7.1/generic/threadCmd.c:1833 #18 0x00007fdb316fd182 in start_thread (arg=0x7fdb2eef8700) at pthread_create.c:312 #19 0x00007fdb31d4247d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111 ~~~ ferrieux added on 2012-06-11 23:35:43: Yes. This is, as you remember, the very motivation for the 8.6 move (bug 2001201). Bottom line: the OS's exit() is the fastest and most rational way of exiting, especially in a multithreaded setup since the ordered untangling of a locking graph is in general intractable. Note that the existence of bugs in thread finalization, when forced, is an orthogonal matter; we didn't even need them to prefer quick-exits ! dgp added on 2012-06-11 22:17:19: Note that due to variations in thread scheduling, this demo script is not deterministic. The fact that it runs once without failing doesn't confirm correct operations. Making several runs with combinations of the 8.5 and trunk branch tips as well as Thread 2.6.7 and 2.7b1 I am able to see either panics or segfaults in all combinations when TCL_FINALIZE_ON_EXIT is set. Using Tcl 8.6's default setting where thread finalization is skipped does appear to avoid the problem. So in some sense, this isn't really fixed, but 8.6 works around it. arius-marius added on 2012-06-09 00:12:50: Set the following environment variable on the command line at Windows... set TCL_FINALIZE_ON_EXIT=1 ...and executed the script again with Tcl version 8.6b2. => No output on the command line appeared pointing to finalization problems. arius-marius added on 2012-06-08 04:30:53: Not reproducible with... Tcl Version: 8.6b2 OS Platform and Version: Windows 7 (64 bit) Thread Package: 2.6.7 (Windows)

Attachments:

3532972fff-3.patch [download] added by adrianmedranocalvo on 2016-05-18 17:21:30. [details]
3532972fff-2.diff.txt [download] added by gahr on 2016-05-18 15:47:24. [details]
3532972fff.diff.txt [download] added by gahr on 2016-05-13 16:10:33. [details]