Tcl Source Code

View Ticket
Login
Ticket UUID: 2001201
Title: Tcl_Exit() shouldn't call the full Tcl_Finalize() path
Type: Bug Version: obsolete: 8.6a0
Submitter: georgeps Created on: 2008-06-23 20:53:24
Subsystem: 49. Threading Assigned To: kennykb
Priority: 8 Severity:
Status: Closed Last Modified: 2009-07-22 15:54:55
Resolution: Fixed Closed By: ferrieux
    Closed on: 2009-07-22 08:51:42
Description:
Tcl_Exit() often results in segfaults on exit, when more than 1 thread is busy at the time of a Tcl_Exit().  The problem is that Tcl_Finalize cleans up the state used by active threads.  This problem has existed for a long time.  The comments in Tcl_Finalize indicate the expectation of 1 thread "there really should only be one thread alive at this moment."

I propose this possible solution:
1. move the invoking in Tcl_Finalize of Tcl exit handlers into TclInvokeExitHandlers();  

2. Tcl_Finalize can call TclInvokeExitHandlers()  

3. Tcl_Exit calls TclInvokeExitHandlers() instead of Tcl_Finalize().


Tcl_Finalize will still work for those that wish to use it, and Tcl_Exit will work in the multiple active threads case.
User Comments: ferrieux added on 2009-07-22 15:54:55:
Typo in last comment, it is method (a): call full Tcl_FinalizeThread().

ferrieux added on 2009-07-22 15:51:42:
Further investigation indicates that method (b) looks safe, so I did that. Committed to HEAD.

dkf added on 2009-07-22 14:56:32:
I asked the question because I knew I didn't have the answer. :-) (Finalization scares me. It's hugely difficult, even worse than initialization.)

ferrieux added on 2009-07-22 14:25:16:
OK, investigation complete. When we reformed Tcl_Exit/Tcl_Finalize, we indeed removed Tcl_FinalizeThread from the exit path. We re-established what we believed to be the only important part of it by calling Tcl_FinalizeIOSubsystem directly, but *not* the loop over the per-thread exit handlers. 

ATM I see two possibilities:
 (a) Re-establish Tcl_FinalizeThread for the calling thread in Tcl_Exit
 (b) Duplicate the loop over per-thread-exit handlers.

The key to the decision being, is the rest of Tcl_FinalizeThread (Notifier+Async+ThreadObjects+ThreadData) potentially harmful in multithreaded setups ?

ferrieux added on 2009-07-22 13:52:35:
Donal, re your question "Should scripts run after Tcl_Exit ?": 

I agree it is questionable, but the problem is that it has been so for years. Take an old wish, bind <Destroy>, and hit ^D, some script activity unmistakably happens at exit-time... So at least a POTENTIAL INCOMPATIBILITY if we decide to cut that rotting branch.

I'd prefer to first understand and fix the current divergence. We can open a separate ticket for a Day-1 bug (I like these ;-) "Script activity at exit time through <Destory> bindings", and possibly make a TIP to change that.

ferrieux added on 2009-07-21 22:01:57:
Hmm, it seems TkFinalizeThread is not called on [exit], hence the absence of <Destroy> callbacks.
It was not my intention; the calling thread's ThreadExitHandler's should be called. More investigation needed.

dkf added on 2009-07-21 21:55:04:
It's not clear to me what those tests should be returning. Should scripts run after Tcl_Exit?
(Dropping priority; I suspect that the tests might be best changed, not the behaviour...)

dgp added on 2009-07-21 21:36:09:
The commit to Tcl HEAD on 2009-06-17
causes the following *Tk* tests to fail:

==== window-2.9 Tk_DestroyWindow, Destroy bindings evaluated after exit FAILED
==== Contents of test case:

    set code [loadTkCommand]
    append code {
        toplevel .t1
        toplevel .t2
        update
        bind .t2 <Destroy> {puts "Destroy .t2" ; exit 1}
        bind .t1 <Destroy> {puts "Destroy .t1" ; exit 0}
        destroy .t2
    }
    set script [makeFile $code script]
    if {[catch {exec [interpreter] $script -geometry 10x10+0+0} msg]} {
        set error 1
    } else {
        set error 0
    }
    removeFile script
    list $error $msg

---- Result was:
1 {Destroy .t2
child process exited abnormally}
---- Result should have been (exact matching):
0 {Destroy .t2
Destroy .t1}
==== window-2.9 FAILED



==== window-2.11 Tk_DestroyWindow, don't reanimate a half-dead window FAILED
==== Contents of test case:

    set code [loadTkCommand]
    append code {
        toplevel .t1
        toplevel .t2
        update
        bind .t1 <Destroy> {
            if {[catch {entry .t2.newchild}]} {
                puts YES
            } else {
                puts NO
            }
        }
        bind .t2 <Destroy> {exit}
        destroy .t2
    }
    set script [makeFile $code script]
    if {[catch {exec [interpreter] $script -geometry 10x10+0+0} msg]} {
        set error 1
    } else {
        set error 0
    }
    removeFile script
    list $error $msg

---- Result was:
0 {}
---- Result should have been (exact matching):
0 YES
==== window-2.11 FAILED

Need to understand this regression
and figure out what to do about it.

ferrieux added on 2009-06-18 02:57:16:
Patch committed to HEAD.

ferrieux added on 2009-06-16 04:51:15:
After a bit more thought it appears that TFIOS is needed because of the ages-old contract of flushing output channels from the calling thread/interp. Since nothing can be attempted on the other threads in the general case without risking deadlocks or crashes, it seems George's patch is the way to go. I have just updated it to current HEAD.

Also I'm upping the prio (1) due to a convergent will to bundle the Thread extension and (2) to reflect another prio-9 bug marked as dup of this one: 486399.

Please review.

ferrieux added on 2009-06-16 04:43:47:

File Deleted - 288754:

ferrieux added on 2009-06-16 04:43:46:

File Deleted - 286764:

ferrieux added on 2009-06-16 04:43:17:

File Added - 331119: tclEvent_ExitChange-6.patch

ferrieux added on 2009-05-17 05:21:58:
Hmm,  after looking a bit deeper at those finalization issues, I am wondering about the value of still calling TclFinalizeIOSubsystem in our new, streamlined Tcl_Exit.
Indeed, TFIOS calls TclpFinalizePipes/Sockets, which on Windows delete associated event sources. But if background threads are themselves in the event loop with e.g. fileevent son pipes/sockets, aren't we pulling the rug from under their feet ?
Moreover, TFIOS flushes channels which may block, so [exit] could hang...
Also, it does this only for channels owned by the current thread's interps, which is not very clean.

Bottom line: shouldn't we forget TFIOS in the exit path ?

ferrieux added on 2009-01-27 14:41:50:
George, now Jeff has okayed my patch and it's been committed. So you can update yours I guess, and ping Jeff.

ferrieux added on 2009-01-21 05:50:08:
After a chat with George I believe this is a good cleanup, by analogy with the accepted wisdom regarding the interaction between threads and fork(): in all cases it is impractical to organize an orderly cleanup of an arbitrary graph of mutexed threads *before* the "singularity" (exit/fork). So better just let it blow the can of worms away (either by exit() or by exec() after fork()).

Who's giving the green light for George to commit ?

georgeps added on 2008-09-19 04:06:30:
das, I'm not sure if the Tk deadlock is related.  Can you test rev 5 of the patch with Tk to see if it fixes the deadlock with a threaded build?

georgeps added on 2008-09-19 04:03:15:
Reassigning to kbk, because he has some time/interest in looking into this.

I'm not ready yet to commit the change, because it seems too simple, and other developers haven't known the area as well.

das added on 2008-08-16 06:40:46:
Logged In: YES 
user_id=90580
Originator: NO

is the window-2.9 deadlock in threaded Tk relevant here? (Tk bugs 1715716  & 740570)

I wrote there:

> The problem is definitely caused by nested invocation of [exit] from the
> <Destroy> handlers; not sure how to go about a fix in Tk, DestroyNotify
> events explicitly ignore TK_ALREADY_DEAD...
> 
> The easiest option might be to check inFinalize before setting it in
> Tcl_Finalize(), or equivalently, to check TclInExit() in Tcl_Exit() before
> calling Tcl_Finalize(), but I don't understand tcl finalization well enough
> to know if that would have undesireable effects.

georgeps added on 2008-08-16 00:25:07:
Logged In: YES 
user_id=585068
Originator: YES

Below is another test case that sometimes segfaults before the patch (the number of runs before a fault varies, but generally once every 5).  After the patch I haven't been able to duplicate the problem.

package require Thread

thread::create {
   set i 0
   while 1 {
       puts $i
       incr i
   }
}
#now for the Tcl_Finalize race

georgeps added on 2008-08-16 00:19:05:
Logged In: YES 
user_id=585068
Originator: YES

Here is a simple Thread test case that faults before the patch is applied, due to the exit of the main thread causing the other threads to have invalid state during Tcl_Finalize, but before TclpExit.

package require Thread

set script {
  puts [time {
    set total 0 ; for {set i 0} {$i < 10000000} {incr i} {incr total $i} ; set total
  }]
}


set t1 [thread::create]
set t2 [thread::create]

thread::send -async $t1 $script
thread::send -async $t2 $script

----
$ tclsh8.6 tcl_thread_speed_test.tcl 
Segmentation fault (core dumped)

(gdb) bt
#0  0xb7e99a04 in __pthread_mutex_unlock_usercnt ()
   from /lib/tls/i686/cmov/libpthread.so.0
#1  0xb7f9b74f in Tcl_MutexUnlock () from /usr/local/lib/libtcl8.6.so
#2  0xb7f9c55f in Tcl_WaitForEvent () from /usr/local/lib/libtcl8.6.so
#3  0xb7f64726 in Tcl_DoOneEvent () from /usr/local/lib/libtcl8.6.so
#4  0xb7485027 in ThreadWaitObjCmd ()
   from /usr/local/lib/thread2.6.5/libthread2.6.5.so
#5  0xfffffffd in ?? ()
#6  0xb7fbf324 in ?? () from /usr/local/lib/libtcl8.6.so
#7  0xb7fc65ec in tclTomMathConstStubsPtr () from /usr/local/lib/libtcl8.6.so
#8  0xb7f6f4c0 in ?? () from /usr/local/lib/libtcl8.6.so
Cannot access memory at address 0xfca3bad3

georgeps added on 2008-08-15 10:01:50:

File Added - 288754: tclEvent_ExitChange-5.patch

Logged In: YES 
user_id=585068
Originator: YES

I updated the patch for the HEAD, and now we have revision 5.  The patch for some reason is more readable than the last, most likely due to whitespace changes.

It's unclear to me whether or not we should have an inFinalize variable and use it within TclInitSubsystems for use with the previous Tcl_Panic pattern, that now only panics when inExit is true.

File Added: tclEvent_ExitChange-5.patch

georgeps added on 2008-08-02 00:19:52:

File Added - 286764: tclEvent_ExitChange-4.patch

Logged In: YES 
user_id=585068
Originator: YES

I solved the problem in almost the way stated in this report.  I had to call the channel finalization function, so that the Tcl_Channels get flushed in Tcl_Exit.   It passes all tests now.
File Added: tclEvent_ExitChange-4.patch

Attachments: