Tk Source Code

View Ticket
Login
Ticket UUID: ef5d3e29a429a48f2903e0661085b42475a58969
Title: Crash in Tk when repeatedly showing file dialog using Python's tkinter on macOS
Type: Bug Version: 8.6.12
Submitter: ronaldoussoren Created on: 2022-05-20 17:16:53
Subsystem: 37. [tk_get*File] Assigned To: marc_culler
Priority: 5 Medium Severity: Minor
Status: Closed Last Modified: 2023-08-30 17:33:48
Resolution: Accepted Closed By: kevin_walzer
    Closed on: 2023-08-30 17:33:48
Description:
The CPython tracker has a report about a crash on macOS when repeatedly showing a save dialog: https://github.com/python/cpython/issues/92603

I've been able to reproduce this report using the macOS installer for python 3.10.4 on python.org. The Python script that causes the crash:

```
import tkinter as tk
from tkinter import filedialog

class Application(tk.Frame):
    def __init__(self, root=None):

        super().__init__(root) 
        self.root = root
        filename = filedialog.askopenfilename(title='open')


def run(title):
    root = tk.Tk()
    root.title(title)
    root.geometry(str(100) + "x" + str(100))
    app = Application(root=root)

    app.mainloop()


def main():
    run("1")
    run("2")
    run("3")
    run("4")
    run("5")
    run("6")

if __name__ == '__main__':
    main()
```

This results in a segmentation fault in code called from Tk:

```
Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       KERN_INVALID_ADDRESS at 0x000000000000001c
Exception Codes:       0x0000000000000001, 0x000000000000001c
Exception Note:        EXC_CORPSE_NOTIFY

Termination Reason:    Namespace SIGNAL, Code 11 Segmentation fault: 11
Terminating Process:   exc handler [11454]

VM Region Info: 0x1c is not in any region.  Bytes before following region: 4408430564
      REGION TYPE                    START - END         [ VSIZE] PRT/MAX SHRMOD  REGION DETAIL
      UNUSED SPACE AT START
--->  
      __TEXT                      106c35000-106c39000    [   16K] r-x/r-x SM=COW  .../MacOS/Python

Thread 0 Crashed::  Dispatch queue: com.apple.main-thread
0   libobjc.A.dylib               	    0x7ff81ddfc536 object_isClass + 24
1   Foundation                    	    0x7ff81ee41e9a KVO_IS_RETAINING_ALL_OBSERVERS_OF_THIS_OBJECT_IF_IT_CRASHES_AN_OBSERVER_WAS_OVERRELEASED_OR_SMASHED + 50
2   Foundation                    	    0x7ff81ee41c24 -[NSObject(NSKeyValueObservingPrivate) _changeValueForKeys:count:maybeOldValuesDict:maybeNewValuesDict:usingBlock:] + 316
3   Foundation                    	    0x7ff81ee50b88 -[NSObject(NSKeyValueObservingPrivate) _notifyObserversOfChangeFromValuesForKeys:toValuesForKeys:] + 980
4   CoreFoundation                	    0x7ff81dfee801 -[CFPrefsSource forEachObserver:] + 280
5   CoreFoundation                	    0x7ff81e02a3f3 -[CFPrefsSource _notifyObserversOfChangeFromValuesForKeys:toValuesForKeys:] + 83
6   CoreFoundation                	    0x7ff81e02a29c ___CFPrefsDeliverPendingKVONotificationsGuts_block_invoke + 374
7   CoreFoundation                	    0x7ff81e02a11e __CFDictionaryApplyFunction_block_invoke + 22
8   CoreFoundation                	    0x7ff81dff2ee3 CFBasicHashApply + 115
9   CoreFoundation                	    0x7ff81dfe5c0b CFDictionaryApplyFunction + 131
10  CoreFoundation                	    0x7ff81e02a093 _CFPrefsDeliverPendingKVONotificationsGuts + 262
11  CoreFoundation                	    0x7ff81dfd69a6 -[_CFXPreferences _deliverPendingKVONotifications] + 90
12  CoreFoundation                	    0x7ff81dfd721d __108-[_CFXPreferences(SearchListAdditions) withSearchListForIdentifier:container:cloudConfigurationURL:perform:]_block_invoke + 402
13  CoreFoundation                	    0x7ff81e131f4b -[_CFXPreferences withSearchListForIdentifier:container:cloudConfigurationURL:perform:] + 374
14  CoreFoundation                	    0x7ff81e02e561 -[_CFXPreferences setValue:forKey:appIdentifier:container:configurationURL:] + 96
15  CoreFoundation                	    0x7ff81e02e4c8 _CFPreferencesSetAppValueWithContainerAndConfiguration + 113
16  Foundation                    	    0x7ff81ee8c920 -[NSUserDefaults(NSUserDefaults) setObject:forKey:] + 73
17  AppKit                        	    0x7ff820a9fc63 -[NSWindow _persistFrame] + 106
18  AppKit                        	    0x7ff820a9f9f4 -[NSWindow _reallySetFrame:] + 683
19  AppKit                        	    0x7ff820a9f315 -[NSWindow _oldPlaceWindow:fromServer:] + 213
20  AppKit                        	    0x7ff820a9d97e -[NSWindow _setFrameCommon:display:fromServer:] + 1278
21  ViewBridge                    	    0x7ff8255c0169 -[NSRemoteView _serviceRequestsResizeInProgress:] + 1507
22  ViewBridge                    	    0x7ff8255eeb01 withImplicitAnimation + 101
23  ViewBridge                    	    0x7ff8255c0431 -[NSRemoteView _serviceRequestsResize:completion:] + 147
24  ViewBridge                    	    0x7ff8255c1183 -[NSRemoteView _serviceRequestsFrame:serviceWindowBackgroundColor:safeFrame:animate:transaction:completion:] + 904
25  ViewBridge                    	    0x7ff8255bb421 -[NSRemoteView _finishAdvanceToConfigPhaseWithContextID:andOffset:] + 1017
26  ViewBridge                    	    0x7ff8255cdce2 __38-[NSRemoteView _advanceToConfigPhase:]_block_invoke_2.2566 + 167
27  CoreFoundation                	    0x7ff81e00c2bc __invoking___ + 140
28  CoreFoundation                	    0x7ff81e00c163 -[NSInvocation invoke] + 305
29  ViewBridge                    	    0x7ff825567e98 __deferNSXPCInvocationOntoMainThread_block_invoke + 124
30  ViewBridge                    	    0x7ff82555b2c4 __wrapBlockWithVoucher_block_invoke + 37
31  ViewBridge                    	    0x7ff82555af4f __deferBlockOntoMainThread_block_invoke_2 + 274
32  CoreFoundation                	    0x7ff81e028681 __CFRUNLOOP_IS_CALLING_OUT_TO_A_BLOCK__ + 12
33  CoreFoundation                	    0x7ff81e02852c __CFRunLoopDoBlocks + 445
34  CoreFoundation                	    0x7ff81e02789d __CFRunLoopRun + 2609
35  CoreFoundation                	    0x7ff81e0267ac CFRunLoopRunSpecific + 562
36  ViewBridge                    	    0x7ff82555a434 __invokeRunLoopInModeForDuration_block_invoke_2 + 25
37  ViewBridge                    	    0x7ff82555a2fa invokeRunLoopInModeForDuration + 179
38  ViewBridge                    	    0x7ff82555a23c __29-[NSCFRunLoopSemaphore wait:]_block_invoke_3 + 82
39  ViewBridge                    	    0x7ff82555a1e8 __CONSIDER_WHO_REQUESTED_THIS_WAIT_BEFORE_SENDING_BUG_TO_VIEWBRIDGE__ + 7
40  ViewBridge                    	    0x7ff82555a1b3 __29-[NSCFRunLoopSemaphore wait:]_block_invoke_2 + 124
41  ViewBridge                    	    0x7ff82555a0a9 __29-[NSCFRunLoopSemaphore wait:]_block_invoke + 163
42  ViewBridge                    	    0x7ff825559ce8 +[NSCFRunLoopSemaphore _observe:whilePerforming:] + 259
43  ViewBridge                    	    0x7ff8255598ae -[NSCFRunLoopSemaphore wait:] + 178
44  AppKit                        	    0x7ff8214548c5 -[NSSavePanel _initBridgeAndStuff] + 379
45  AppKit                        	    0x7ff82145553b -[NSSavePanel initWithContentRect:styleMask:backing:defer:] + 140
46  AppKit                        	    0x7ff82145f16f -[NSOpenPanel initWithContentRect:styleMask:backing:defer:] + 150
47  AppKit                        	    0x7ff820d467e8 -[NSPanel init] + 69
48  AppKit                        	    0x7ff82145548c -[NSSavePanel init] + 197
49  AppKit                        	    0x7ff82111dab2 +[NSSavePanel(Instantiation) _crunchyRawUnbonedPanel] + 58
50  libtk8.6.dylib                	       0x1071932bf Tk_GetOpenFileObjCmd + 63
51  libtcl8.6.dylib               	       0x1073bec71 TclNRRunCallbacks + 79
52  _tkinter.cpython-310-darwin.so	       0x1070c16c5 Tkapp_Call + 389
53  Python                        	       0x107779bea cfunction_call + 90
54  Python                        	       0x107713acd _PyObject_Call + 109
55  Python                        	       0x10784d5b7 _PyEval_EvalFrameDefault + 35431
56  Python                        	       0x1078431df _PyEval_Vector + 383
57  Python                        	       0x10785401f call_function + 175
58  Python                        	       0x10784d22a _PyEval_EvalFrameDefault + 34522
59  Python                        	       0x1078431df _PyEval_Vector + 383
60  Python                        	       0x10785401f call_function + 175
```


I see a similar crash report in issue 6c88c5270a8f8a07a0a22efbdbcba5c60dd00129 in the Tk tracker, not sure if that's related or just accidental.
User Comments: kevin_walzer added on 2023-08-30 17:33:15:
Crash no longer evident with Christopher's patch. Committed to trunk and core-8-6-branch. Thanks as always for your good work, Christopher. 

The TCT does not do hot fix releases except in the rarest of cases; this would not rise to that level. Anyone who cares to rebuild Tk to incorporate this update can do so from either the fossil repo or the Github mirror.

nab added on 2023-08-30 16:39:12:
Hi,
I do not personally use Python but when Christopher is writing that his patch deserve an hotfix release, it would be nice if someone could take care of it...

This is also another vote four decoupling Tcl && Tk

best regards,
nicolas

chrstphrchvz added on 2023-08-12 07:58:58:

Can the patch for this issue please be reviewed soon? Tkinter users in particular continue to report this issue, including most recently to the tcl-mac mailing list. Personally, I would consider this issue critical enough to warrant an off-cycle hotfix release.


chrstphrchvz added on 2023-06-28 17:58:10:

I have revised the patch to also undo [e3384e50f56f], since the KVO crash in [6c88c5270a] has the same cause as the one in this ticket, and so I find no evidence of a macOS Mojave KVO bug.


marc_culler (claiming to be Marc Culler) added on 2023-06-11 11:50:40:
Wow.  Thank you for tracking that down Christopher!

chrstphrchvz added on 2023-06-11 11:16:18:

Yet another way to trigger this crash is to run tclsh, do package require Tk; destroy ., change the appearance in System Preferences, and then do update.


chrstphrchvz added on 2023-06-11 11:14:04:

This crash occurs because Tk Aqua calls addObserver:… on a TKContentView* in viewDidChangeEffectiveAppearance:, but does not call removeObserver:… anywhere, leaving the possibility that whenever something touches AppleHighlightColor, retain will get called on the TKContentView* after it is deallocated.

The “Discussion” section of the documentation for addObserver:forKeyPath:options:context: says:

”Neither the object receiving this message, nor observer, are retained. An object that calls this method must also eventually call either the removeObserver:forKeyPath: or removeObserver:forKeyPath:context: method to unregister the observer when participating in KVO.”

And for removeObserver:forKeyPath::

“Be sure to invoke this method … before any object specified in addObserver:forKeyPath:options:context: is deallocated.”

While thinking of where removeObserver:… should be called from, it also occurs to me that the way addObserver:… is currently called does not completely fix [315104a5c1]. In viewDidChangeEffectiveAppearance, defaultColor is static, and so only the first TKContentView* for which this method is called over the life of the process (i.e. not just a specific Tcl interpreter) will observe changes to the highlight color and correctly generate an <AppearanceChanged> event for the corresponding toplevel.

Currently I would suggest moving addObserver: usage to where a TKContentView* is created, and putting removeObserver: just before where the TKWindow* for a TKContentView* is released; see attached patch.


marc_culler added on 2023-04-06 03:30:18:
Indeed, the bugfix branch leaks memory.  a TKContentView is never freed.
Back to the drawing board ...

marc_culler (claiming to be Marc Culler) added on 2023-04-05 12:22:10:
Well, in that case we should be releasing the contentView when we are
finished using it, not when it is first created.  The fact that we are
not doing that would suggest that the bugfix branch might have a memory
leak because we never release a ContentView. That should not be too
hard to check.

chrstphrchvz added on 2023-04-05 04:37:01:

I think the setContentView: documentation is referring to the presumably common pattern where the receiver takes ownership using retain and the sender gives up ownership using release (and when the receiver is done with the object, it will also do release). The contentView property is declared as strong (in NSWindow.h as of the macOS 10.10 SDK), so setContentView: will implicitly retain. Another example of this would be -[NSImage addRepresentation:] (as used in CreateNSImageFromPixmap()), whose documentation similarly says “Any representation added by this method is retained by the receiver”.

Looking briefly for example code using setContentView:, the Google books previews I find seem to call autorelease on any newly created (NSView *). But GDK does have a couple of instances where setContentView: is followed by manually releasing: https://gitlab.gnome.org/GNOME/gtk/-/blob/4362f7e6e253/gdk/macos/GdkMacosWindow.c#L287, https://gitlab.gnome.org/GNOME/gtk/-/blob/4362f7e6e253/gdk/macos/gdkmacosglcontext.c#L147.


marc_culler (claiming to be Marc Culler) added on 2023-04-05 01:27:24:
I guess when Apple says that the NSWindow "owns" the contentView they mean
that it takes responsibility for releasing the contentView.

I checked that the python script in the report no longer segfaults if
the Tk library is replaced by the one built with the bugfix branch.

chrstphrchvz added on 2023-04-05 00:01:51:

There was a wrong call to [contentView release in tkMacOSXWm.c which has been there since 2009.

I noticed that call as well, but after checking https://developer.apple.com/documentation/appkit/nswindow/1419160-contentview?language=objc which says “The window retains the new content view and owns it thereafter”, I believed that doing [contentView release] was correct. Although I can see how having an extra retain on the TKContentView somewhere could be a workaround to avoid this crash.


marc_culler added on 2023-04-04 23:28:00:
I think this is now fixed in bug-ef5d3e29a4-redux.  There was a wrong
call to [contentView release in tkMacOSXWm.c which has been there since
2009.

chrstphrchvz added on 2023-04-04 16:30:53:

As I understand it, a Tk application is supposed to exit as soon as there are no windows left. So I think that the expected behavior with your script should be that the application terminates before the second interpreter is created. Is that correct?

I would not think so, at least for programs that use Tcl C APIs to create/destroy multiple interpreters (even if using multiple interpreters is discouraged for other reasons, such as being unnecessarily used when one interpreter with multiple toplevels will suffice, or to avoid common issues or mistakes when using multiple interpreters). The documentation for destroy says:

If a window “.” is deleted then all windows will be destroyed and the application will (normally) exit.
…but I wonder if it is only describing what it considers typical usage, and that it only considers typical usage to be a wish script rather than tclsh or C API usage.


chrstphrchvz added on 2023-04-04 15:45:40:

I am using macOS 12.6.4 on Intel.

I do see the __CONSIDER_WHO_REQUESTED_THIS_WAIT_BEFORE_SENDING_BUG_TO_VIEWBRIDGE__ symbol in the stack trace for this crash but not always. Looking for it on Google, I find it also appearing in spindumps (from app hangs/freezes), so I wonder if the name has to do with performance bugs incorrectly reported to ViewBridge rather than whatever is calling this wait: method. But also noticing that from there it enters a nested runloop I would think means it is letting the thread stay busy with other stuff while it waits for a semaphore, rather than intentionally doing some specific step that it knows can crash.

I have yet to find a debugging technique to easily find what is ending up with a stale reference to the released (TKContentView *).


marc_culler (claiming to be Marc Culler) added on 2023-04-04 15:39:32:
As I understand it, a Tk application is supposed to exit as soon as there
are no windows left.  So I think that the expected behavior with your
script should be that the application terminates before the second
interpreter is created.  Is that correct?

marc_culler (claiming to be Marc Culler) added on 2023-04-04 15:27:03:
One more thing -- when I run the script with wish neither window
gets destroyed after the file dialog is cancelled.  The windows
appear to function normally - they can be resized or closed.  The
Tk application does not terminate until both are closed.

Moreover if I append the line:

  i2 eval after 100 {pack [label .lb -text "Hello #2"] -padx 20 -pady 20} 

at the end of the script then the label appears in tk#2 when I cancel
the file dialog. So window "tk #2" does not appear to be a zombie.

Adding a similar line for the root window of i1 results in an error:

Error in startup script: can't invoke "label" command: application has been destroyed.

That would seem to be appropriate, except for the fact that the root
window is still on the screen.  So I think that the root window probably
is a zombie.

Of course, all of the fancy stuff with autorelease pools was added when
trying to eliminate zombies.  At that point, no NSWindow was ever being
destroyed.

marc_culler (claiming to be Marc Culler) added on 2023-04-04 14:39:16:
Update: when I run your script using tclsh, I do see the crash on
Ventura.  And I do not see the filedialog appear when I do that.
But if I run the script with wish the filedialog appears and there is
no crash.

marc_culler (claiming to be Marc Culler) added on 2023-04-04 13:22:19:
Christopher, which version of macOS are you using?  The stack trace from the
ticket includes a call to the function:
__CONSIDER_WHO_REQUESTED_THIS_WAIT_BEFORE_SENDING_BUG_TO_VIEWBRIDGE__
which I have also seen in other reports of crashes in the filedialog.

The name of that function speaks for itself.  And, as far as I know it
only exists in Monterey.

When I tried running your code on Ventura I did not see a crash.

chrstphrchvz added on 2023-04-04 00:04:07:

Unfortunately, after further testing, I have managed to trigger this crash with 8.6.13 and core-8-6-branch.

Running with NSZombieEnabled=YES reveals the already-released object that an observer is trying to retain:

2023-04-03 15:29:57.681 wish8.6[88385:1992733] *** -[TKContentView retain]: message sent to deallocated instance 0x61600006a580
zsh: illegal hardware instruction
I observe that the affected (TKContentView *) gets released during [NSApp _resetAutoreleasePool] in TkMacOSXEventsSetupProc(); prior to [d638e7ca3616] it would be released while in TkWmDeadWindow(), so hopefully this explains why that change helped prevent this crash but did not fix the underlying cause.

My guess for why Tkinter examples tend to trigger this crash could have to do with destroying the root window of one interpreter but then continuing to use Tk in other interpreters. I do not know why using multiple interpreters seems important for this crash, though; I just have not successfully triggered it with multiple toplevels in a single interpreter. This Tcl script seems to reliably trigger the crash:

interp create -- i1
i1 eval package require Tk
i1 eval update
i1 eval destroy .
i1 eval update
interp create -- i2
i2 eval package require Tk
i1 eval update
i2 eval tk_getOpenFile


marc_culler (claiming to be Marc Culler) added on 2023-04-03 01:05:50:
Of course.

chrstphrchvz added on 2023-04-03 00:22:11:

I’ve verified that applying the change from [d638e7ca3616] prevents this crash. Is it okay to close this as fixed (by 8.6.13) rather than works-for-me?


marc_culler (claiming to be Marc Culler) added on 2022-05-21 23:08:10:
It turns out that nothing actually needs to be done here.  These
crashes do not occur with the current Tk tip.

I did the following:

* Build and install the current core-8-6-branch of Tk (which is 8.6.12).

* sudo cp /Library/Frameworks/Tk.framework/Versions/8.6/Tk /Library/Frameworks/Python.framework/Versions/3.10/lib/libtk8.6.dylib

* edit /Library/Frameworks/Python.framework/Versions/3.10/lib/tk8.6/tk.tcl
to change the package require line to:
package require -exact Tk 8.6.12

Then when I ran the test script it did not crash.

I was able to reproduce the crash using the Tk library that ships with
Python 3.10, by the way.

kevin_walzer added on 2022-05-21 14:18:20:
My testing of this with both the Python script and an equivalent Tcl script showed a reference to autorelease pools being corrupted; I tried removing the call to _resetAutoreleasePool in tkMacOSXNotify.c. That did appear to eliminate the crashes, but caused instability elsewhere (a couple of of tests in the test suite failed where they did not before), so I am not going to pursue that approach. 

I agree with Marc that this seems to be a edge case.

marc_culler (claiming to be Marc Culler) added on 2022-05-21 04:48:37:
Maybe a workaround would be to refuse to open a file dialog while another
file dialog is running.

What is the use case for opening 6 file dialogs in rapid succession?

marc_culler (claiming to be Marc Culler) added on 2022-05-21 04:42:06:
Yes, this has been seen before. As far as I can tell it is an Apple bug.
(Yes, they do exist.)

It would appear that the engineering team in charge of ViewBridge
was forced into doing something that they knew would lead to a crash.
How else can you explain the existence of a method named:

__CONSIDER_WHO_REQUESTED_THIS_WAIT_BEFORE_SENDING_BUG_TO_VIEWBRIDGE__ ?

There is no Tk code involved in running a file dialog on macOS.  In fact,
the file dialog is managed by a separate process.  This was a change made
in macOS 10.15.  Apple says:

"In macOS 10.15, the system always displays the Save dialog in a separate
process, regardless of whether the app is sandboxed. When the user saves the
document, macOS adds the saved file to the app's sandbox (if necessary) so
that the app can write to the file. Prior to macOS 10.15, the system used a
separate process only for sandboxed apps."

If we want Tk to present a native file dialog I don't think there is any
alternative to instantiating an NSPanel object.  As far as I can tell the
crash occurs within the first function called in Tk_GetOpenFileObjCmd, at
the line:
   openpanel =  [NSOpenPanel openPanel];
which presumably creates an observer and launches the process which runs
the dialog.

I don't know what we could possibly do there, other than instantiate the
NSPanel.

Attachments: