Ticket UUID: | c18c36f80c67c88be099d38a24fc01448eda3a03 | |||
Title: | Cocoa Tk can segfault when embedded Tcl interpreters are created and destroyed in the same app bundle | |||
Type: | Bug | Version: | 8.5.16 | |
Submitter: | ned.deily | Created on: | 2014-10-30 22:29:45 | |
Subsystem: | 66. Aqua Window Operations | Assigned To: | kevin_walzer | |
Priority: | 5 Medium | Severity: | Minor | |
Status: | Open | Last Modified: | 2017-11-29 14:10:39 | |
Resolution: | None | Closed By: | nobody | |
Closed on: | ||||
Description: |
Due to recent changes in Python's test suite, a formerly working scenario can now result in Tk segfaults on OS X using current Cocoa Tk's. Most users of Tk would probably not run into this case but it does, I believe, point out a bug in Cocoa Tk, one that results in a crash. The scenario is complicated and my analysis may not be totally correct but I think the basic problem is clear. The Python test suite runner has a -j n option to allow multiple tests to be run simultaneously. If -j is not specified, the test runner runs each test sequentially in one process. If -j n is specified, where n > 0, the test runner creates n threads and, in each thread, repeatedly creates a subprocess to run one test until all tests have been executed. Previously it was the case that the Python tests that used Tk (i.e. tests of the Python tkinter module) were pretty much all run with the same Tk root, created by the first test to run. Recently, code was added to the tests to ensure that the root was destroyed after each test case and a new root created to avoid possible side effects among tests. With that code in place, the test will often, but not always, cause Tk to segfault when run with -j. To investigagte, I built debug versions of a Tcl and Tk 8.5.17 release candidate and, after examining the resulting crash reports, added some printf's (so the source line numbers may be off a little bit in the traceback). I believe the problem here is with the callbacks that Tk sets up to handle the "high-level" Apple Events in tkMacOSXHLEvents.c. Here's the output from a test run: $ ./bin/python3.5 -m test -w -uall -j3 test_tk [1/1] test_tk *** in TkpInit ***, interp=0x7f8cf4904610 *** in TkpInit initialization ***, interp=0x7f8cf4904610 *** in TkMacOSXInitAppleEvents ***, interp=0x7f8cf4904610 *** in TkpInit ***, interp=0x7f8cf4ac0210 Fatal Python error: Segmentation fault You can see TkpInit (in tkMacOSXInit.c) is called the first time with a pointer to one interpreter. It then does its first time initialization (protected by 'if (!initialized)') including calling TkMacOSXInitAppleEvents to setup the callbacks for the various Apple Event handlers. In particular, Tk stores the address of the interp instance it is running under into the Apple event callbacks. It also happens that this first Tcl interpreter instance isn't asked to do much so it doesn't get as far as having Tk actually display anything which I think is significant later. So far, no problem. Then, for the next text case, Python cleans up the existing root and indirectly causes a new Tcl interpreter instance to be created: you can see that TkpInit is called again but with a different interp address. But because "initialized" is still set in tkMacOSXInit, TkMacOSXInitAppleEvents is not called a second time. This second interpreter instance is asked to actually display stuff which, I believe, results in the process finally being promoted to a gui app and results in an oapp ("open application") Apple event being received by Tk. In the crash report below, you can see that happening. Unfortunately, the oapp callback handler established in the first call to TkpInit (and TkMacOSXInitAppleEvents) contains the address of the first interpreter instance, now no longer valid. But OappHandler blindly passes it along to Tcl_GetCommandInfo and neither check that the interp pointer is still valid. So Tcl_GetCommandInfo dereferences a no-longer valid pointer; this is where the random behavior occurs. If you're lucky, nothing bad happens (which seems to often (always?) be the case when running in 32-bit mode); in 64-bit mode, it will often crash as in this case in frame 3. Process: pytest_10.9 [93374] Path: /py/*/pytest_10.9.framework/Versions/3.5/Resources/Python.app/Contents/MacOS/pytest_10.9 Identifier: org.python.pytest_10.9 Version: 3.5.0a0 (3.5.0a0) Code Type: X86-64 (Native) Parent Process: pytest_10.9 [93373] Responsible: Terminal [12437] User ID: 501 Date/Time: 2014-10-30 13:05:51.987 -0700 OS Version: Mac OS X 10.9.5 (13F34) Report Version: 11 Anonymous UUID: 7FDFA581-A708-3453-3FE0-0D3FEA7F6707 Crashed Thread: 0 Dispatch queue: com.apple.main-thread Exception Type: EXC_BAD_ACCESS (SIGSEGV) Exception Codes: EXC_I386_GPFLT Thread 0 Crashed:: Dispatch queue: com.apple.main-thread 0 libsystem_kernel.dylib 0x00007fff971ff292 __kill + 10 1 pytest_10.9 0x00000001076ab870 faulthandler_fatal_error + 432 (faulthandler.c:312) 2 libsystem_platform.dylib 0x00007fff97f335aa _sigtramp + 26 3 ??? 0xcbcbcbcbcbcbcbcb 0 + 14685055086129564619 4 Tcl_debug 0x0000000108314623 Tcl_GetCommandInfo + 51 (tclBasic.c:2566) 5 Tk_debug 0x000000010862c4a9 OappHandler + 121 (tkMacOSXHLEvents.c:229) 6 com.apple.AE 0x00007fff96494e1f aeDispatchAppleEvent(AEDesc const*, AEDesc*, unsigned int, unsigned char*) + 381 7 com.apple.AE 0x00007fff96494c32 dispatchEventAndSendReply(AEDesc const*, AEDesc*) + 31 8 com.apple.AE 0x00007fff96494b36 aeProcessAppleEvent + 315 9 com.apple.HIToolbox 0x00007fff9133f161 AEProcessAppleEvent + 56 10 com.apple.AppKit 0x00007fff93b1a0b6 _DPSNextEvent + 1026 11 com.apple.AppKit 0x00007fff93b1989b -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] + 122 12 Tk_debug 0x000000010863cd52 -[TKApplication(TKNotify) nextEventMatchingMask:untilDate:inMode:dequeue:] + 338 (tkMacOSXNotify.c:56) 13 Tk_debug 0x000000010863d421 TkMacOSXEventsCheckProc + 401 (tkMacOSXNotify.c:271) 14 Tcl_debug 0x00000001083e7f66 Tcl_DoOneEvent + 550 (tclNotify.c:968) 15 Tk_debug 0x00000001084f318f Tk_UpdateObjCmd + 271 (tkCmds.c:1104) 16 Tcl_debug 0x0000000108315df7 TclEvalObjvInternal + 2423 (tclBasic.c:3727) 17 Tcl_debug 0x0000000108316fa5 Tcl_EvalObjv + 117 (tclBasic.c:3925) 18 _tkinter.so 0x00000001082e9dbb Tkapp_Call + 827 (_tkinter.c:1278) 19 pytest_10.9 0x00000001074499dd PyCFunction_Call + 221 (methodobject.c:94) 20 pytest_10.9 0x00000001075926a6 call_function + 1974 (ceval.c:4269) 21 pytest_10.9 0x000000010758ab2f PyEval_EvalFrameEx + 78335 (ceval.c:2860) [...] 132 pytest_10.9 0x000000010761baaa RunModule + 1114 (main.c:209) 133 pytest_10.9 0x000000010761af1f Py_Main + 4335 (main.c:710) 134 org.python.pytest_10.9 0x000000010739de64 0x10739d000 + 3684 135 libdyld.dylib 0x00007fff8ac7a5fd start + 1 Thread 1:: Dispatch queue: com.apple.libdispatch-manager 0 libsystem_kernel.dylib 0x00007fff97200662 kevent64 + 10 1 libdispatch.dylib 0x00007fff928c6421 _dispatch_mgr_invoke + 239 2 libdispatch.dylib 0x00007fff928c6136 _dispatch_mgr_thread + 52 Thread 2: 0 libsystem_kernel.dylib 0x00007fff971ffe6a __workq_kernreturn + 10 1 libsystem_pthread.dylib 0x00007fff8ceb1f08 _pthread_wqthread + 330 2 libsystem_pthread.dylib 0x00007fff8ceb4fb9 start_wqthread + 13 Thread 3: 0 libsystem_kernel.dylib 0x00007fff971ffe6a __workq_kernreturn + 10 1 libsystem_pthread.dylib 0x00007fff8ceb1f08 _pthread_wqthread + 330 2 libsystem_pthread.dylib 0x00007fff8ceb4fb9 start_wqthread + 13 Thread 4: 0 libsystem_kernel.dylib 0x00007fff971ffe6a __workq_kernreturn + 10 1 libsystem_pthread.dylib 0x00007fff8ceb1f08 _pthread_wqthread + 330 2 libsystem_pthread.dylib 0x00007fff8ceb4fb9 start_wqthread + 13 Thread 5: 0 libsystem_kernel.dylib 0x00007fff971ff9aa __select + 10 1 Tcl_debug 0x0000000108449105 NotifierThreadProc + 949 (tclMacOSXNotify.c:1778) 2 libsystem_pthread.dylib 0x00007fff8ceb0899 _pthread_body + 138 3 libsystem_pthread.dylib 0x00007fff8ceb072a _pthread_start + 137 4 libsystem_pthread.dylib 0x00007fff8ceb4fc9 thread_start + 13 Thread 0 crashed with X86 Thread State (64-bit): rax: 0x0000000000000000 rbx: 0x000000010799df58 rcx: 0x000000010799da48 rdx: 0x0000000000000000 rdi: 0x0000000000016cbe rsi: 0x000000000000000b rbp: 0x000000010799dab0 rsp: 0x000000010799da48 r8: 0x0000000000000012 r9: 0x00007fff5882f700 r10: 0x0000000000000002 r11: 0x0000000000000202 r12: 0x0000000061657674 r13: 0x00007fff7cc1cbb8 r14: 0x0000000000000003 r15: 0x000000006f617070 rip: 0x00007fff971ff292 rfl: 0x0000000000000202 cr2: 0x000000010d05a258 Logical CPU: 0 Error Code: 0x02000025 Trap Number: 133 Some of the outputs from the printf's get lost in the crash. Here's another run where the call to Tcl_GetCommandInfo in OappHandler has been commented out so that there is no crash. Here you can see that OappHandler was called with the (now stale) address of the initial interp that was saved and returned in the oapp callback: $ ./bin/python3.5 -m test -w -uall -j2 -v test_ttk_guionly [...] [1/1] test_ttk_guionly *** in TkpInit ***, interp=0x7feffa119010 *** in TkpInit initialization ***, interp=0x7feffa119010 *** in TkMacOSXInitAppleEvents ***, interp=0x7feffa119010 *** in TkpInit ***, interp=0x7feff9a02a10 test_horizontal_range (tkinter.test.test_ttk.test_extensions.LabeledScaleTest) ... *** in OappHandler ***, interp=0x7feffa119010 ok test_initialization (tkinter.test.test_ttk.test_extensions.LabeledScaleTest) ... ok test_initialization_no_master (tkinter.test.test_ttk.test_extensions.LabeledScaleTest) ... *** in TkpInit ***, interp=0x7feffa277410 ok test_resize (tkinter.test.test_ttk.test_extensions.LabeledScaleTest) ... ok test_variable_change (tkinter.test.test_ttk.test_extensions.LabeledScaleTest) ... ok test_widget_destroy (tkinter.test.test_ttk.test_extensions.LabeledScaleTest) ... *** in TkpInit ***, interp=0x7feff9b67c10 ok test_initialization (tkinter.test.test_ttk.test_extensions.OptionMenuTest) ... ok test_menu (tkinter.test.test_ttk.test_extensions.OptionMenuTest) ... ok test_widget_destroy (tkinter.test.test_ttk.test_extensions.OptionMenuTest) ... *** in TkpInit ***, interp=0x7feff9b67c10 We certainly can workaround the crash in Python by special-casing the Tk-related tests. But, in any case, it's not nice for Tk to segfault. The Tk Apple event handlers should not be so trusting as to assume that the callbacks still contain a valid interpreter instance pointer and/or TkMacOSXInitAppleEvents should possibly be called for each interpreter initialization. | |||
User Comments: |
marc_culler added on 2017-11-29 14:10:39:
Using _tkinter.cpython-36m-darwin.so built against Tk 8.6, and adding the line: fprintf(stderr, "TkpInit: initialized = %d, interp = %p\n", initialized, interp); at the beginning of TkpInit in the current core-8-6-branch leaf, here is the output from the python test with -j3: $ python3.6 -m test -w -uall -j3 test_tk Run tests in parallel using 3 child processes 0:00:03 load avg: 1.40 [1/1] test_tk passed TkpInit: initialized = 0, interp = 0x10189be10 TkpInit: initialized = 1, interp = 0x10189be10 TkpInit: initialized = 1, interp = 0x10194ca10 TkpInit: initialized = 1, interp = 0x10194ca10 [ ... identical lines deleted ...] TkpInit: initialized = 1, interp = 0x10194ca10 1 test OK. Total duration: 4 sec Tests result: SUCCESS marc_culler added on 2017-11-28 15:06:02: Check-ins [d71d6703] and [ebd79f48] provide rewritten TkpInit functions for core-8-6-branch and trunk. I tried running the multithreaded python tests with python3.5 -m test -w -uall -j3 test_tk python3.6 -m test -w -uall -j3 test_tk and also with -j10. I did not encounter any crashes, but Python3.6 reported a load average of 1.58 which I think indicates that the tests are actually running in multiple threads. So I think this problem may now be fixed. One thing worth mentioning is that the detailed analysis provided in this ticket in incorrect about one point, at least for 8.6 and 8.7. The pointer to the interpreter is not being stored in a callback. It is a property of the NSApplication object, which is created in TkpInit by calling [NSApplication sharedApplication]. That method creates exactly one NSApplication object -- subsequent calls to the method return the same object -- and it stores a pointer to the object in the global variable NSApp. Allowing TkpInit to do initialization a second time with a second interpreter would change the value of the _eventInterp property, but it would also call the [NSApp finishLaunching] method a second time. I have no idea what damage that would cause. There are several places in the code which check whether the _eventInterp property is NULL, such as: if (_eventInterp && Tcl_FindCommand(_eventInterp, cmd, NULL, 0)) { ... But it seems worth investigating what happens to the _eventInterp property when an interpreter is destroyed. ned.deily added on 2014-10-31 00:32:30: There are really two separate issues here. I agree with being wary about calling tkMacOSXInit more than once. I might get around to trying it. But the more important part is to prevent the segfault and presumably that could be done without calling tkMacOSXInit more than once. The trick is to figure out how to check whether the interp value received in the callback is still valid. kevin_walzer added on 2014-10-30 23:54:49: If you did want to try an alternative, however, my approach would be to to remove the if (!initialized) call right after that comment, let the initialization happen each time, and run your tests again. I am not clear on how to implement a pure Tcl/Tk test case that will replicate the setup you have in place from Python. Let me know what happens, if you give it a try. kevin_walzer added on 2014-10-30 23:48:52: I certainly agree that it's not good for Tk to segfault, but I note this comment at the fromt of the TkpInit function in tkMacOSXInit.c: /* * Since it is possible for TkInit to be called multiple times and we * don't want to do the following initialization multiple times we protect * against doing it more than once. */ In other words, as I understand it, the behavior you observe is there by design. I'm not quite certain of why this is the case, but since it seems to be at the root of an awful lot in Tk/Mac, I'd be exceptionally wary of changing it. I have no idea of the side effects or consequence of such a change. |