Tk Source Code

View Ticket
Ticket UUID: c18c36f80c67c88be099d38a24fc01448eda3a03
Title: Cocoa Tk can segfault when embedded Tcl interpreters are created and destroyed in the same app bundle
Type: Bug Version: 8.5.16
Submitter: ned.deily Created on: 2014-10-30 22:29:45
Subsystem: 66. Aqua Window Operations Assigned To: kevin_walzer
Priority: 5 Medium Severity: Minor
Status: Open Last Modified: 2017-11-29 14:10:39
Resolution: None Closed By: nobody
    Closed on:
Due to recent changes in Python's test suite, a formerly working scenario can now result in Tk segfaults on OS X using current Cocoa Tk's.  Most users of Tk would probably not run into this case but it does, I believe, point out a bug in Cocoa Tk, one that results in a crash.  The scenario is complicated and my analysis may not be totally correct but I think the basic problem is clear.

The Python test suite runner has a -j n option to allow multiple tests to be run simultaneously.  If -j is not specified, the test runner runs each test sequentially in one process.  If -j n is specified, where n > 0, the test runner creates n threads and, in each thread, repeatedly creates a subprocess to run one test until all tests have been executed.  Previously it was the case that the Python tests that used Tk (i.e. tests of the Python tkinter module) were pretty much all run with the same Tk root, created by the first test to run.  Recently, code was added to the tests to ensure that the root was destroyed after each test case and a new root created to avoid possible side effects among tests.  With that code in place, the test will often, but not always, cause Tk to segfault when run with -j.  To investigagte, I built debug versions of a Tcl and Tk 8.5.17 release candidate and, after examining the resulting crash reports, added some printf's (so the source line numbers may be off a little bit in the traceback).

I believe the problem here is with the callbacks that Tk sets up to handle the "high-level" Apple Events in tkMacOSXHLEvents.c.  Here's the output from a test run:

$ ./bin/python3.5 -m test -w -uall -j3 test_tk
[1/1] test_tk
*** in TkpInit ***, interp=0x7f8cf4904610 
 *** in TkpInit initialization ***, interp=0x7f8cf4904610 
 *** in TkMacOSXInitAppleEvents ***, interp=0x7f8cf4904610 
 *** in TkpInit ***, interp=0x7f8cf4ac0210
Fatal Python error: Segmentation fault

You can see TkpInit (in tkMacOSXInit.c) is called the first time with a pointer to one interpreter.  It then does its first time initialization (protected by 'if (!initialized)') including calling TkMacOSXInitAppleEvents to setup the callbacks for the various Apple Event handlers.  In particular, Tk stores the address of the interp instance it is running under into the Apple event callbacks.  It also happens that this first Tcl interpreter instance isn't asked to do much so it doesn't get as far as having Tk actually display anything which I think is significant later.  So far, no problem.  Then, for the next text case, Python cleans up the existing root and indirectly causes a new Tcl interpreter instance to be created: you can see that TkpInit is called again but with a different interp address.  But because "initialized" is still set in tkMacOSXInit, TkMacOSXInitAppleEvents is not called a second time.  This second interpreter instance is asked to actually display stuff which, I believe, results in the process finally being promoted to a gui app and results in an oapp ("open application") Apple event being received by Tk.  In the crash report below, you can see that happening.  Unfortunately, the oapp callback handler established in the first call to TkpInit (and TkMacOSXInitAppleEvents) contains the address of the first interpreter instance, now no longer valid.  But OappHandler blindly passes it along to Tcl_GetCommandInfo and neither check that the interp pointer is still valid.  So Tcl_GetCommandInfo dereferences a no-longer valid pointer; this is where the random behavior occurs.  If you're lucky, nothing bad happens (which seems to often (always?) be the case when running in 32-bit mode); in 64-bit mode, it will often crash as in this case in frame 3.

Process:         pytest_10.9 [93374]
Path:            /py/*/pytest_10.9.framework/Versions/3.5/Resources/
Identifier:      org.python.pytest_10.9
Version:         3.5.0a0 (3.5.0a0)
Code Type:       X86-64 (Native)
Parent Process:  pytest_10.9 [93373]
Responsible:     Terminal [12437]
User ID:         501

Date/Time:       2014-10-30 13:05:51.987 -0700
OS Version:      Mac OS X 10.9.5 (13F34)
Report Version:  11
Anonymous UUID:  7FDFA581-A708-3453-3FE0-0D3FEA7F6707

Crashed Thread:  0  Dispatch queue:

Exception Codes: EXC_I386_GPFLT

Thread 0 Crashed:: Dispatch queue:
0   libsystem_kernel.dylib        	0x00007fff971ff292 __kill + 10
1   pytest_10.9                   	0x00000001076ab870 faulthandler_fatal_error + 432 (faulthandler.c:312)
2   libsystem_platform.dylib      	0x00007fff97f335aa _sigtramp + 26
3   ???                           	0xcbcbcbcbcbcbcbcb 0 + 14685055086129564619
4   Tcl_debug                     	0x0000000108314623 Tcl_GetCommandInfo + 51 (tclBasic.c:2566)
5   Tk_debug                      	0x000000010862c4a9 OappHandler + 121 (tkMacOSXHLEvents.c:229)
6                  	0x00007fff96494e1f aeDispatchAppleEvent(AEDesc const*, AEDesc*, unsigned int, unsigned char*) + 381
7                  	0x00007fff96494c32 dispatchEventAndSendReply(AEDesc const*, AEDesc*) + 31
8                  	0x00007fff96494b36 aeProcessAppleEvent + 315
9           	0x00007fff9133f161 AEProcessAppleEvent + 56
10              	0x00007fff93b1a0b6 _DPSNextEvent + 1026
11              	0x00007fff93b1989b -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] + 122
12  Tk_debug                      	0x000000010863cd52 -[TKApplication(TKNotify) nextEventMatchingMask:untilDate:inMode:dequeue:] + 338 (tkMacOSXNotify.c:56)
13  Tk_debug                      	0x000000010863d421 TkMacOSXEventsCheckProc + 401 (tkMacOSXNotify.c:271)
14  Tcl_debug                     	0x00000001083e7f66 Tcl_DoOneEvent + 550 (tclNotify.c:968)
15  Tk_debug                      	0x00000001084f318f Tk_UpdateObjCmd + 271 (tkCmds.c:1104)
16  Tcl_debug                     	0x0000000108315df7 TclEvalObjvInternal + 2423 (tclBasic.c:3727)
17  Tcl_debug                     	0x0000000108316fa5 Tcl_EvalObjv + 117 (tclBasic.c:3925)
18                   	0x00000001082e9dbb Tkapp_Call + 827 (_tkinter.c:1278)
19  pytest_10.9                   	0x00000001074499dd PyCFunction_Call + 221 (methodobject.c:94)
20  pytest_10.9                   	0x00000001075926a6 call_function + 1974 (ceval.c:4269)
21  pytest_10.9                   	0x000000010758ab2f PyEval_EvalFrameEx + 78335 (ceval.c:2860)
132 pytest_10.9                   	0x000000010761baaa RunModule + 1114 (main.c:209)
133 pytest_10.9                   	0x000000010761af1f Py_Main + 4335 (main.c:710)
134 org.python.pytest_10.9        	0x000000010739de64 0x10739d000 + 3684
135 libdyld.dylib                 	0x00007fff8ac7a5fd start + 1

Thread 1:: Dispatch queue:
0   libsystem_kernel.dylib        	0x00007fff97200662 kevent64 + 10
1   libdispatch.dylib             	0x00007fff928c6421 _dispatch_mgr_invoke + 239
2   libdispatch.dylib             	0x00007fff928c6136 _dispatch_mgr_thread + 52

Thread 2:
0   libsystem_kernel.dylib        	0x00007fff971ffe6a __workq_kernreturn + 10
1   libsystem_pthread.dylib       	0x00007fff8ceb1f08 _pthread_wqthread + 330
2   libsystem_pthread.dylib       	0x00007fff8ceb4fb9 start_wqthread + 13

Thread 3:
0   libsystem_kernel.dylib        	0x00007fff971ffe6a __workq_kernreturn + 10
1   libsystem_pthread.dylib       	0x00007fff8ceb1f08 _pthread_wqthread + 330
2   libsystem_pthread.dylib       	0x00007fff8ceb4fb9 start_wqthread + 13

Thread 4:
0   libsystem_kernel.dylib        	0x00007fff971ffe6a __workq_kernreturn + 10
1   libsystem_pthread.dylib       	0x00007fff8ceb1f08 _pthread_wqthread + 330
2   libsystem_pthread.dylib       	0x00007fff8ceb4fb9 start_wqthread + 13

Thread 5:
0   libsystem_kernel.dylib        	0x00007fff971ff9aa __select + 10
1   Tcl_debug                     	0x0000000108449105 NotifierThreadProc + 949 (tclMacOSXNotify.c:1778)
2   libsystem_pthread.dylib       	0x00007fff8ceb0899 _pthread_body + 138
3   libsystem_pthread.dylib       	0x00007fff8ceb072a _pthread_start + 137
4   libsystem_pthread.dylib       	0x00007fff8ceb4fc9 thread_start + 13

Thread 0 crashed with X86 Thread State (64-bit):
  rax: 0x0000000000000000  rbx: 0x000000010799df58  rcx: 0x000000010799da48  rdx: 0x0000000000000000
  rdi: 0x0000000000016cbe  rsi: 0x000000000000000b  rbp: 0x000000010799dab0  rsp: 0x000000010799da48
   r8: 0x0000000000000012   r9: 0x00007fff5882f700  r10: 0x0000000000000002  r11: 0x0000000000000202
  r12: 0x0000000061657674  r13: 0x00007fff7cc1cbb8  r14: 0x0000000000000003  r15: 0x000000006f617070
  rip: 0x00007fff971ff292  rfl: 0x0000000000000202  cr2: 0x000000010d05a258
Logical CPU:     0
Error Code:      0x02000025
Trap Number:     133

Some of the outputs from the printf's get lost in the crash.  Here's another run where the call to Tcl_GetCommandInfo in OappHandler has been commented out so that there is no crash.  Here you can see that OappHandler was called with the (now stale) address of the initial interp that was saved and returned in the oapp callback:

$ ./bin/python3.5 -m test -w -uall -j2 -v test_ttk_guionly
[1/1] test_ttk_guionly
 *** in TkpInit ***, interp=0x7feffa119010 
 *** in TkpInit initialization ***, interp=0x7feffa119010 
 *** in TkMacOSXInitAppleEvents ***, interp=0x7feffa119010 
 *** in TkpInit ***, interp=0x7feff9a02a10 
test_horizontal_range (tkinter.test.test_ttk.test_extensions.LabeledScaleTest) ...  
 *** in OappHandler ***, interp=0x7feffa119010 
test_initialization (tkinter.test.test_ttk.test_extensions.LabeledScaleTest) ... ok
test_initialization_no_master (tkinter.test.test_ttk.test_extensions.LabeledScaleTest) ...  
 *** in TkpInit ***, interp=0x7feffa277410 
test_resize (tkinter.test.test_ttk.test_extensions.LabeledScaleTest) ... ok
test_variable_change (tkinter.test.test_ttk.test_extensions.LabeledScaleTest) ... ok
test_widget_destroy (tkinter.test.test_ttk.test_extensions.LabeledScaleTest) ...  
 *** in TkpInit ***, interp=0x7feff9b67c10 
test_initialization (tkinter.test.test_ttk.test_extensions.OptionMenuTest) ... ok
test_menu (tkinter.test.test_ttk.test_extensions.OptionMenuTest) ... ok
test_widget_destroy (tkinter.test.test_ttk.test_extensions.OptionMenuTest) ...
 *** in TkpInit ***, interp=0x7feff9b67c10 

We certainly can workaround the crash in Python by special-casing the Tk-related tests.  But, in any case, it's not nice for Tk to segfault.  The Tk Apple event handlers should not be so trusting as to assume that the callbacks still contain a valid interpreter instance pointer and/or TkMacOSXInitAppleEvents should possibly be called for each interpreter initialization.
User Comments: marc_culler added on 2017-11-29 14:10:39:
Using built against Tk 8.6, and adding the line:
  fprintf(stderr, "TkpInit: initialized = %d, interp = %p\n", initialized, interp);
at the beginning of TkpInit in the current core-8-6-branch leaf, here is the
output from the python test with -j3:

$ python3.6 -m test -w -uall -j3 test_tk
Run tests in parallel using 3 child processes
0:00:03 load avg: 1.40 [1/1] test_tk passed
TkpInit: initialized = 0, interp = 0x10189be10
TkpInit: initialized = 1, interp = 0x10189be10
TkpInit: initialized = 1, interp = 0x10194ca10
TkpInit: initialized = 1, interp = 0x10194ca10

[ ... identical lines deleted ...]

TkpInit: initialized = 1, interp = 0x10194ca10
1 test OK.

Total duration: 4 sec
Tests result: SUCCESS

marc_culler added on 2017-11-28 15:06:02:
Check-ins [d71d6703] and [ebd79f48] provide rewritten TkpInit functions for
core-8-6-branch and trunk.  I tried running the multithreaded python tests with

python3.5 -m test -w -uall -j3 test_tk
python3.6 -m test -w -uall -j3 test_tk

and also with -j10.  I did not encounter any crashes, but Python3.6 reported a
load average of 1.58 which I think indicates that the tests are actually
running in multiple threads.  So I think this problem may now be fixed.

One thing worth mentioning is that the detailed analysis provided in this ticket
in incorrect about one point, at least for 8.6 and 8.7.  The pointer to the
interpreter is not being stored in a callback.  It is a property of the
NSApplication object, which is created in TkpInit by calling
[NSApplication sharedApplication].  That method creates exactly one NSApplication
object -- subsequent calls to the method return the same object -- and it
stores a pointer to the object in the global variable NSApp.  Allowing
TkpInit to do initialization a second time with a second interpreter would
change the value of the _eventInterp property, but it would also call the
[NSApp finishLaunching] method a second time.  I have no idea what damage that
would cause.

There are several places in the code which check whether the _eventInterp
property is NULL, such as:

if (_eventInterp && Tcl_FindCommand(_eventInterp, cmd, NULL, 0)) { ...

But it seems worth investigating what happens to the _eventInterp property when
an interpreter is destroyed.

ned.deily added on 2014-10-31 00:32:30:
There are really two separate issues here.  I agree with being wary about calling tkMacOSXInit more than once.  I might get around to trying it.  But the more important part is to prevent the segfault and presumably that could be done without calling tkMacOSXInit more than once.  The trick is to figure out how to check whether the interp value received in the callback is still valid.

kevin_walzer added on 2014-10-30 23:54:49:
If you did want to try an alternative, however, my approach would be to  to remove the if (!initialized) call right after that comment, let the initialization happen each time, and run your tests again. I am not clear on how to implement a pure Tcl/Tk test case that will replicate the setup you have in place from Python. Let me know what happens, if you give it a try.

kevin_walzer added on 2014-10-30 23:48:52:
I certainly agree that it's not good for Tk to segfault, but I note this comment at the fromt of the TkpInit function in tkMacOSXInit.c:

     * Since it is possible for TkInit to be called multiple times and we
     * don't want to do the following initialization multiple times we protect
     * against doing it more than once.

In other words, as I understand it, the behavior you observe is there by design. I'm not quite certain of why this is the case, but since it seems to be at the root of an awful lot in Tk/Mac, I'd be exceptionally wary of changing it. I have no idea of the side effects or consequence of such a change.