Ticket UUID: | ef5d3e29a429a48f2903e0661085b42475a58969 | |||
Title: | Crash in Tk when repeatedly showing file dialog using Python's tkinter on macOS | |||
Type: | Bug | Version: | 8.6.12 | |
Submitter: | ronaldoussoren | Created on: | 2022-05-20 17:16:53 | |
Subsystem: | 37. [tk_get*File] | Assigned To: | marc_culler | |
Priority: | 5 Medium | Severity: | Minor | |
Status: | Closed | Last Modified: | 2023-08-30 17:33:48 | |
Resolution: | Accepted | Closed By: | kevin_walzer | |
Closed on: | 2023-08-30 17:33:48 | |||
Description: |
The CPython tracker has a report about a crash on macOS when repeatedly showing a save dialog: https://github.com/python/cpython/issues/92603 I've been able to reproduce this report using the macOS installer for python 3.10.4 on python.org. The Python script that causes the crash: ``` import tkinter as tk from tkinter import filedialog class Application(tk.Frame): def __init__(self, root=None): super().__init__(root) self.root = root filename = filedialog.askopenfilename(title='open') def run(title): root = tk.Tk() root.title(title) root.geometry(str(100) + "x" + str(100)) app = Application(root=root) app.mainloop() def main(): run("1") run("2") run("3") run("4") run("5") run("6") if __name__ == '__main__': main() ``` This results in a segmentation fault in code called from Tk: ``` Crashed Thread: 0 Dispatch queue: com.apple.main-thread Exception Type: EXC_BAD_ACCESS (SIGSEGV) Exception Codes: KERN_INVALID_ADDRESS at 0x000000000000001c Exception Codes: 0x0000000000000001, 0x000000000000001c Exception Note: EXC_CORPSE_NOTIFY Termination Reason: Namespace SIGNAL, Code 11 Segmentation fault: 11 Terminating Process: exc handler [11454] VM Region Info: 0x1c is not in any region. Bytes before following region: 4408430564 REGION TYPE START - END [ VSIZE] PRT/MAX SHRMOD REGION DETAIL UNUSED SPACE AT START ---> __TEXT 106c35000-106c39000 [ 16K] r-x/r-x SM=COW .../MacOS/Python Thread 0 Crashed:: Dispatch queue: com.apple.main-thread 0 libobjc.A.dylib 0x7ff81ddfc536 object_isClass + 24 1 Foundation 0x7ff81ee41e9a KVO_IS_RETAINING_ALL_OBSERVERS_OF_THIS_OBJECT_IF_IT_CRASHES_AN_OBSERVER_WAS_OVERRELEASED_OR_SMASHED + 50 2 Foundation 0x7ff81ee41c24 -[NSObject(NSKeyValueObservingPrivate) _changeValueForKeys:count:maybeOldValuesDict:maybeNewValuesDict:usingBlock:] + 316 3 Foundation 0x7ff81ee50b88 -[NSObject(NSKeyValueObservingPrivate) _notifyObserversOfChangeFromValuesForKeys:toValuesForKeys:] + 980 4 CoreFoundation 0x7ff81dfee801 -[CFPrefsSource forEachObserver:] + 280 5 CoreFoundation 0x7ff81e02a3f3 -[CFPrefsSource _notifyObserversOfChangeFromValuesForKeys:toValuesForKeys:] + 83 6 CoreFoundation 0x7ff81e02a29c ___CFPrefsDeliverPendingKVONotificationsGuts_block_invoke + 374 7 CoreFoundation 0x7ff81e02a11e __CFDictionaryApplyFunction_block_invoke + 22 8 CoreFoundation 0x7ff81dff2ee3 CFBasicHashApply + 115 9 CoreFoundation 0x7ff81dfe5c0b CFDictionaryApplyFunction + 131 10 CoreFoundation 0x7ff81e02a093 _CFPrefsDeliverPendingKVONotificationsGuts + 262 11 CoreFoundation 0x7ff81dfd69a6 -[_CFXPreferences _deliverPendingKVONotifications] + 90 12 CoreFoundation 0x7ff81dfd721d __108-[_CFXPreferences(SearchListAdditions) withSearchListForIdentifier:container:cloudConfigurationURL:perform:]_block_invoke + 402 13 CoreFoundation 0x7ff81e131f4b -[_CFXPreferences withSearchListForIdentifier:container:cloudConfigurationURL:perform:] + 374 14 CoreFoundation 0x7ff81e02e561 -[_CFXPreferences setValue:forKey:appIdentifier:container:configurationURL:] + 96 15 CoreFoundation 0x7ff81e02e4c8 _CFPreferencesSetAppValueWithContainerAndConfiguration + 113 16 Foundation 0x7ff81ee8c920 -[NSUserDefaults(NSUserDefaults) setObject:forKey:] + 73 17 AppKit 0x7ff820a9fc63 -[NSWindow _persistFrame] + 106 18 AppKit 0x7ff820a9f9f4 -[NSWindow _reallySetFrame:] + 683 19 AppKit 0x7ff820a9f315 -[NSWindow _oldPlaceWindow:fromServer:] + 213 20 AppKit 0x7ff820a9d97e -[NSWindow _setFrameCommon:display:fromServer:] + 1278 21 ViewBridge 0x7ff8255c0169 -[NSRemoteView _serviceRequestsResizeInProgress:] + 1507 22 ViewBridge 0x7ff8255eeb01 withImplicitAnimation + 101 23 ViewBridge 0x7ff8255c0431 -[NSRemoteView _serviceRequestsResize:completion:] + 147 24 ViewBridge 0x7ff8255c1183 -[NSRemoteView _serviceRequestsFrame:serviceWindowBackgroundColor:safeFrame:animate:transaction:completion:] + 904 25 ViewBridge 0x7ff8255bb421 -[NSRemoteView _finishAdvanceToConfigPhaseWithContextID:andOffset:] + 1017 26 ViewBridge 0x7ff8255cdce2 __38-[NSRemoteView _advanceToConfigPhase:]_block_invoke_2.2566 + 167 27 CoreFoundation 0x7ff81e00c2bc __invoking___ + 140 28 CoreFoundation 0x7ff81e00c163 -[NSInvocation invoke] + 305 29 ViewBridge 0x7ff825567e98 __deferNSXPCInvocationOntoMainThread_block_invoke + 124 30 ViewBridge 0x7ff82555b2c4 __wrapBlockWithVoucher_block_invoke + 37 31 ViewBridge 0x7ff82555af4f __deferBlockOntoMainThread_block_invoke_2 + 274 32 CoreFoundation 0x7ff81e028681 __CFRUNLOOP_IS_CALLING_OUT_TO_A_BLOCK__ + 12 33 CoreFoundation 0x7ff81e02852c __CFRunLoopDoBlocks + 445 34 CoreFoundation 0x7ff81e02789d __CFRunLoopRun + 2609 35 CoreFoundation 0x7ff81e0267ac CFRunLoopRunSpecific + 562 36 ViewBridge 0x7ff82555a434 __invokeRunLoopInModeForDuration_block_invoke_2 + 25 37 ViewBridge 0x7ff82555a2fa invokeRunLoopInModeForDuration + 179 38 ViewBridge 0x7ff82555a23c __29-[NSCFRunLoopSemaphore wait:]_block_invoke_3 + 82 39 ViewBridge 0x7ff82555a1e8 __CONSIDER_WHO_REQUESTED_THIS_WAIT_BEFORE_SENDING_BUG_TO_VIEWBRIDGE__ + 7 40 ViewBridge 0x7ff82555a1b3 __29-[NSCFRunLoopSemaphore wait:]_block_invoke_2 + 124 41 ViewBridge 0x7ff82555a0a9 __29-[NSCFRunLoopSemaphore wait:]_block_invoke + 163 42 ViewBridge 0x7ff825559ce8 +[NSCFRunLoopSemaphore _observe:whilePerforming:] + 259 43 ViewBridge 0x7ff8255598ae -[NSCFRunLoopSemaphore wait:] + 178 44 AppKit 0x7ff8214548c5 -[NSSavePanel _initBridgeAndStuff] + 379 45 AppKit 0x7ff82145553b -[NSSavePanel initWithContentRect:styleMask:backing:defer:] + 140 46 AppKit 0x7ff82145f16f -[NSOpenPanel initWithContentRect:styleMask:backing:defer:] + 150 47 AppKit 0x7ff820d467e8 -[NSPanel init] + 69 48 AppKit 0x7ff82145548c -[NSSavePanel init] + 197 49 AppKit 0x7ff82111dab2 +[NSSavePanel(Instantiation) _crunchyRawUnbonedPanel] + 58 50 libtk8.6.dylib 0x1071932bf Tk_GetOpenFileObjCmd + 63 51 libtcl8.6.dylib 0x1073bec71 TclNRRunCallbacks + 79 52 _tkinter.cpython-310-darwin.so 0x1070c16c5 Tkapp_Call + 389 53 Python 0x107779bea cfunction_call + 90 54 Python 0x107713acd _PyObject_Call + 109 55 Python 0x10784d5b7 _PyEval_EvalFrameDefault + 35431 56 Python 0x1078431df _PyEval_Vector + 383 57 Python 0x10785401f call_function + 175 58 Python 0x10784d22a _PyEval_EvalFrameDefault + 34522 59 Python 0x1078431df _PyEval_Vector + 383 60 Python 0x10785401f call_function + 175 ``` I see a similar crash report in issue 6c88c5270a8f8a07a0a22efbdbcba5c60dd00129 in the Tk tracker, not sure if that's related or just accidental. | |||
User Comments: |
kevin_walzer added on 2023-08-30 17:33:15:
Crash no longer evident with Christopher's patch. Committed to trunk and core-8-6-branch. Thanks as always for your good work, Christopher. The TCT does not do hot fix releases except in the rarest of cases; this would not rise to that level. Anyone who cares to rebuild Tk to incorporate this update can do so from either the fossil repo or the Github mirror. nab added on 2023-08-30 16:39:12: Hi, I do not personally use Python but when Christopher is writing that his patch deserve an hotfix release, it would be nice if someone could take care of it... This is also another vote four decoupling Tcl && Tk best regards, nicolas chrstphrchvz added on 2023-08-12 07:58:58: Can the patch for this issue please be reviewed soon? Tkinter users in particular continue to report this issue, including most recently to the tcl-mac mailing list. Personally, I would consider this issue critical enough to warrant an off-cycle hotfix release. chrstphrchvz added on 2023-06-28 17:58:10: I have revised the patch to also undo [e3384e50f56f], since the KVO crash in [6c88c5270a] has the same cause as the one in this ticket, and so I find no evidence of a macOS Mojave KVO bug. marc_culler (claiming to be Marc Culler) added on 2023-06-11 11:50:40: Wow. Thank you for tracking that down Christopher! chrstphrchvz added on 2023-06-11 11:16:18: Yet another way to trigger this crash is to run tclsh, do package require Tk; destroy ., change the appearance in System Preferences, and then do update. chrstphrchvz added on 2023-06-11 11:14:04: This crash occurs because Tk Aqua calls addObserver:… on a TKContentView* in viewDidChangeEffectiveAppearance:, but does not call removeObserver:… anywhere, leaving the possibility that whenever something touches AppleHighlightColor, retain will get called on the TKContentView* after it is deallocated. The “Discussion” section of the documentation for addObserver:forKeyPath:options:context: says: ”Neither the object receiving this message, nor observer, are retained. An object that calls this method must also eventually call either the removeObserver:forKeyPath: or removeObserver:forKeyPath:context: method to unregister the observer when participating in KVO.” And for removeObserver:forKeyPath:: “Be sure to invoke this method … before any object specified in addObserver:forKeyPath:options:context: is deallocated.” While thinking of where removeObserver:… should be called from, it also occurs to me that the way addObserver:… is currently called does not completely fix [315104a5c1]. In viewDidChangeEffectiveAppearance, defaultColor is static, and so only the first TKContentView* for which this method is called over the life of the process (i.e. not just a specific Tcl interpreter) will observe changes to the highlight color and correctly generate an <AppearanceChanged> event for the corresponding toplevel. Currently I would suggest moving addObserver: usage to where a TKContentView* is created, and putting removeObserver: just before where the TKWindow* for a TKContentView* is released; see attached patch. marc_culler added on 2023-04-06 03:30:18: Indeed, the bugfix branch leaks memory. a TKContentView is never freed. Back to the drawing board ... marc_culler (claiming to be Marc Culler) added on 2023-04-05 12:22:10: Well, in that case we should be releasing the contentView when we are finished using it, not when it is first created. The fact that we are not doing that would suggest that the bugfix branch might have a memory leak because we never release a ContentView. That should not be too hard to check. chrstphrchvz added on 2023-04-05 04:37:01: I think the setContentView: documentation is referring to the presumably common pattern where the receiver takes ownership using retain and the sender gives up ownership using release (and when the receiver is done with the object, it will also do release). The contentView property is declared as strong (in NSWindow.h as of the macOS 10.10 SDK), so setContentView: will implicitly retain. Another example of this would be -[NSImage addRepresentation:] (as used in CreateNSImageFromPixmap()), whose documentation similarly says “Any representation added by this method is retained by the receiver”. Looking briefly for example code using setContentView:, the Google books previews I find seem to call autorelease on any newly created (NSView *). But GDK does have a couple of instances where setContentView: is followed by manually releasing: https://gitlab.gnome.org/GNOME/gtk/-/blob/4362f7e6e253/gdk/macos/GdkMacosWindow.c#L287, https://gitlab.gnome.org/GNOME/gtk/-/blob/4362f7e6e253/gdk/macos/gdkmacosglcontext.c#L147. marc_culler (claiming to be Marc Culler) added on 2023-04-05 01:27:24: I guess when Apple says that the NSWindow "owns" the contentView they mean that it takes responsibility for releasing the contentView. I checked that the python script in the report no longer segfaults if the Tk library is replaced by the one built with the bugfix branch. chrstphrchvz added on 2023-04-05 00:01:51:
There was a wrong call to [contentView release in tkMacOSXWm.c which has been there since 2009. I noticed that call as well, but after checking https://developer.apple.com/documentation/appkit/nswindow/1419160-contentview?language=objc which says “The window retains the new content view and owns it thereafter”, I believed that doing [contentView release] was correct. Although I can see how having an extra retain on the TKContentView somewhere could be a workaround to avoid this crash. marc_culler added on 2023-04-04 23:28:00: I think this is now fixed in bug-ef5d3e29a4-redux. There was a wrong call to [contentView release in tkMacOSXWm.c which has been there since 2009. chrstphrchvz added on 2023-04-04 16:30:53:
As I understand it, a Tk application is supposed to exit as soon as there are no windows left. So I think that the expected behavior with your script should be that the application terminates before the second interpreter is created. Is that correct? I would not think so, at least for programs that use Tcl C APIs to create/destroy multiple interpreters (even if using multiple interpreters is discouraged for other reasons, such as being unnecessarily used when one interpreter with multiple toplevels will suffice, or to avoid common issues or mistakes when using multiple interpreters). The documentation for destroy says: If a window “.” is deleted then all windows will be destroyed and the application will (normally) exit.…but I wonder if it is only describing what it considers typical usage, and that it only considers typical usage to be a wish script rather than tclsh or C API usage. chrstphrchvz added on 2023-04-04 15:45:40: I am using macOS 12.6.4 on Intel. I do see the __CONSIDER_WHO_REQUESTED_THIS_WAIT_BEFORE_SENDING_BUG_TO_VIEWBRIDGE__ symbol in the stack trace for this crash but not always. Looking for it on Google, I find it also appearing in spindumps (from app hangs/freezes), so I wonder if the name has to do with performance bugs incorrectly reported to ViewBridge rather than whatever is calling this wait: method. But also noticing that from there it enters a nested runloop I would think means it is letting the thread stay busy with other stuff while it waits for a semaphore, rather than intentionally doing some specific step that it knows can crash. I have yet to find a debugging technique to easily find what is ending up with a stale reference to the released (TKContentView *). marc_culler (claiming to be Marc Culler) added on 2023-04-04 15:39:32: As I understand it, a Tk application is supposed to exit as soon as there are no windows left. So I think that the expected behavior with your script should be that the application terminates before the second interpreter is created. Is that correct? marc_culler (claiming to be Marc Culler) added on 2023-04-04 15:27:03: One more thing -- when I run the script with wish neither window gets destroyed after the file dialog is cancelled. The windows appear to function normally - they can be resized or closed. The Tk application does not terminate until both are closed. Moreover if I append the line: i2 eval after 100 {pack [label .lb -text "Hello #2"] -padx 20 -pady 20} at the end of the script then the label appears in tk#2 when I cancel the file dialog. So window "tk #2" does not appear to be a zombie. Adding a similar line for the root window of i1 results in an error: Error in startup script: can't invoke "label" command: application has been destroyed. That would seem to be appropriate, except for the fact that the root window is still on the screen. So I think that the root window probably is a zombie. Of course, all of the fancy stuff with autorelease pools was added when trying to eliminate zombies. At that point, no NSWindow was ever being destroyed. marc_culler (claiming to be Marc Culler) added on 2023-04-04 14:39:16: Update: when I run your script using tclsh, I do see the crash on Ventura. And I do not see the filedialog appear when I do that. But if I run the script with wish the filedialog appears and there is no crash. marc_culler (claiming to be Marc Culler) added on 2023-04-04 13:22:19: Christopher, which version of macOS are you using? The stack trace from the ticket includes a call to the function: __CONSIDER_WHO_REQUESTED_THIS_WAIT_BEFORE_SENDING_BUG_TO_VIEWBRIDGE__ which I have also seen in other reports of crashes in the filedialog. The name of that function speaks for itself. And, as far as I know it only exists in Monterey. When I tried running your code on Ventura I did not see a crash. chrstphrchvz added on 2023-04-04 00:04:07: Unfortunately, after further testing, I have managed to trigger this crash with 8.6.13 and core-8-6-branch. Running with NSZombieEnabled=YES reveals the already-released object that an observer is trying to retain: I observe that the affected (TKContentView *) gets released during [NSApp _resetAutoreleasePool] in TkMacOSXEventsSetupProc(); prior to [d638e7ca3616] it would be released while in TkWmDeadWindow(), so hopefully this explains why that change helped prevent this crash but did not fix the underlying cause.2023-04-03 15:29:57.681 wish8.6[88385:1992733] *** -[TKContentView retain]: message sent to deallocated instance 0x61600006a580 zsh: illegal hardware instruction My guess for why Tkinter examples tend to trigger this crash could have to do with destroying the root window of one interpreter but then continuing to use Tk in other interpreters. I do not know why using multiple interpreters seems important for this crash, though; I just have not successfully triggered it with multiple toplevels in a single interpreter. This Tcl script seems to reliably trigger the crash: interp create -- i1 i1 eval package require Tk i1 eval update i1 eval destroy . i1 eval update interp create -- i2 i2 eval package require Tk i1 eval update i2 eval tk_getOpenFile marc_culler (claiming to be Marc Culler) added on 2023-04-03 01:05:50: Of course. chrstphrchvz added on 2023-04-03 00:22:11: I’ve verified that applying the change from [d638e7ca3616] prevents this crash. Is it okay to close this as fixed (by 8.6.13) rather than works-for-me? marc_culler (claiming to be Marc Culler) added on 2022-05-21 23:08:10: It turns out that nothing actually needs to be done here. These crashes do not occur with the current Tk tip. I did the following: * Build and install the current core-8-6-branch of Tk (which is 8.6.12). * sudo cp /Library/Frameworks/Tk.framework/Versions/8.6/Tk /Library/Frameworks/Python.framework/Versions/3.10/lib/libtk8.6.dylib * edit /Library/Frameworks/Python.framework/Versions/3.10/lib/tk8.6/tk.tcl to change the package require line to: package require -exact Tk 8.6.12 Then when I ran the test script it did not crash. I was able to reproduce the crash using the Tk library that ships with Python 3.10, by the way. kevin_walzer added on 2022-05-21 14:18:20: My testing of this with both the Python script and an equivalent Tcl script showed a reference to autorelease pools being corrupted; I tried removing the call to _resetAutoreleasePool in tkMacOSXNotify.c. That did appear to eliminate the crashes, but caused instability elsewhere (a couple of of tests in the test suite failed where they did not before), so I am not going to pursue that approach. I agree with Marc that this seems to be a edge case. marc_culler (claiming to be Marc Culler) added on 2022-05-21 04:48:37: Maybe a workaround would be to refuse to open a file dialog while another file dialog is running. What is the use case for opening 6 file dialogs in rapid succession? marc_culler (claiming to be Marc Culler) added on 2022-05-21 04:42:06: Yes, this has been seen before. As far as I can tell it is an Apple bug. (Yes, they do exist.) It would appear that the engineering team in charge of ViewBridge was forced into doing something that they knew would lead to a crash. How else can you explain the existence of a method named: __CONSIDER_WHO_REQUESTED_THIS_WAIT_BEFORE_SENDING_BUG_TO_VIEWBRIDGE__ ? There is no Tk code involved in running a file dialog on macOS. In fact, the file dialog is managed by a separate process. This was a change made in macOS 10.15. Apple says: "In macOS 10.15, the system always displays the Save dialog in a separate process, regardless of whether the app is sandboxed. When the user saves the document, macOS adds the saved file to the app's sandbox (if necessary) so that the app can write to the file. Prior to macOS 10.15, the system used a separate process only for sandboxed apps." If we want Tk to present a native file dialog I don't think there is any alternative to instantiating an NSPanel object. As far as I can tell the crash occurs within the first function called in Tk_GetOpenFileObjCmd, at the line: openpanel = [NSOpenPanel openPanel]; which presumably creates an observer and launches the process which runs the dialog. I don't know what we could possibly do there, other than instantiate the NSPanel. |