Ticket UUID: | 8ce672d1a16826e9cdb001322a8b04e0083ebe7d | |||
Title: | Segfault with Tk >= 8.6.14 when using menu(button) with the -font option in a certain way | |||
Type: | Bug | Version: | 8.6.14 | |
Submitter: | robgcs | Created on: | 2024-11-22 16:43:01 | |
Subsystem: | 75. wish | Assigned To: | fvogel | |
Priority: | 5 Medium | Severity: | Critical | |
Status: | Closed | Last Modified: | 2024-12-04 09:18:29 | |
Resolution: | Fixed | Closed By: | oehhar | |
Closed on: | 2024-12-04 09:18:29 | |||
Description: |
Hello, everybody After upgrading to Tk 8.6.14, we noticed that our application crashes at a certain point with a memory access error (segfault). The error does not occur with Tk <= 8.6.13. We were able to create a small sample script that can be used to reproduce the error (see below). Kind regards, Robert #!/usr/bin/env wish # If you click on the menu button and select the entry, the program crashes # with Tk >= 8.6.14. # # Tested under # openSUSE Leap 15.4, Tcl/Tk 8.6.12 (works) # openSUSE Leap 15.6, Tcl/Tk 8.6.12 (works) # openSUSE Leap 15.6, Tcl/Tk 8.6.13 (works) # openSUSE Leap 15.6, Tcl/Tk 8.6.14 (crashes) # openSUSE Leap 15.6, Tcl/Tk 8.6.15 (crashes) # openSUSE Tumbleweed, Tcl/Tk 8.6.15 (crashes) # Ubuntu 20.04, Tcl/Tk 8.6.10 (works) proc createBut {w font} { destroy $w frame $w menubutton $w.mb -text $font -font $font -menu $w.mb.m menu $w.mb.m -tearoff 0 set font [expr {$font eq "Courier" ? "Times" : "Courier"}] # Does not crash if you use Tk <= 8.6.13 or remove the -font option below $w.mb.m add command \ -label $font \ -font $font \ -command [list createBut $w $font] pack $w.mb pack $w } createBut .but "Courier" | |||
User Comments: |
oehhar added on 2024-12-04 09:18:29:
Great work, guys! Would it be worthwile to mention in the changes, that the issue was introduced in Tk 8.6.14? For me, it is always sensitive to answer the question "am I affected"? Thanks, Harald fvogel added on 2024-12-03 03:57:46: Hmmm... OK, even if the test suite looks fine I can't actually demonstrate that commandPtr is always valid when it is used by Tcl_EvalObjEx with mePtr being Tcl_Release()'d a few lines before that call. This makes your patch superior indeed. I'm merging the bugfix branch with your patch and am closing this ticket. Thanks! chw added on 2024-12-02 22:08:45: Are you sure that commandPtr is always every time valid when the containing menu entry is Tcl_Release()'d beforehand? fvogel added on 2024-12-02 21:14:31: Well, I had a try at instrumenting the calls to Tcl_Preserve()/Tcl_Release() related to the menu command. Here is an output on Windows, starting at the button click on the menu: <set up of the menu and entry; up to this point, the calls to Tcl_Release() match the preivous Tcl_Preserve() calls> WM_COMMAND, Tcl_Preserve menu 0000018141CEFB00 TkInvokeMenu, Tcl_Preserve entry 0000018141D45160 TkInvokeMenu, Tcl_EvalObjEx TkDestroyMenu, Tcl_Preserve menu 0000018141CEFB00 TkDestroyMenu, Tcl_Release menu 0000018141CEFB00 MenuWidgObjCmd, Tcl_Preserve menu 0000018141CF1100 MenuWidgObjCmd, MENU_ADD MenuWidgObjCmd, Tcl_Release menu 0000018141CF1100 (done) TkInvokeMenu, Tcl_EvalObjEx returned TkInvokeMenu, Tcl_Release entry 0000018141D45160 <crash!!!> So on clicking the menu entry D45160, TkWinHandleMenuEvent() is called with WM_COMMAND. menuPtr CEFB00 is Tcl_Preserve()'d. The menu invoke command starts running and preserves the menu entry D45160. The -command script attached to the entry gets run (Tcl_EvalObjEx()). This bound script destroys the CEFB00 menu (we have matching Preserve/Release calls of CEFB00 here). At this point the tkwin attached to the menu gets Tcl_EventuallyFree()'d. The bound script creates a new menu CF1100 and adds a menu entry to it. The bound script returns (Tcl_EvalObjEx() returned). And now the menu entry D45160 gets Tcl_Release()'d. But this crashes because tkwin is NULL. Conclusion #1: I don't see any missing or duplicate calls to Tcl_Preserve()/Tcl_Release() here. Only a regular call to Tcl_Release() that crashes because in the release process something (the tkwin) unexpectedly disappeared from under its feet. Conclusion #2: While chw's patch works, another possible patch is simply: Index: generic/tkMenu.c ================================================================== --- generic/tkMenu.c +++ generic/tkMenu.c @@ -1048,10 +1048,11 @@ TCL_GLOBAL_ONLY|TCL_LEAVE_ERR_MSG) == NULL) { result = TCL_ERROR; } Tcl_DecrRefCount(valuePtr); } + Tcl_Release(mePtr); /* * We check numEntries in addition to whether the menu entry has a command * because that goes to zero if the menu gets deleted (e.g., during * command evaluation). @@ -1063,11 +1064,10 @@ Tcl_IncrRefCount(commandPtr); result = Tcl_EvalObjEx(interp, commandPtr, TCL_EVAL_GLOBAL); Tcl_DecrRefCount(commandPtr); } - Tcl_Release(mePtr); done: return result; } I prefer this patch over chw's proposal [006481ff]. Opinions? fvogel added on 2024-12-02 06:59:13: This sounds kind of a quest. Can we agree on: - merge branch bug-8ce672d1a1 now, since chw's patch there gets included in 8.6.16. Here a real bug was demonstrated and fixed. - keep this ticket open (or alternatively open a dedicated new ticket) and let the quest happen in a later timeframe (can anyone demonstrate a remaining bug?) griffin added on 2024-11-28 00:35:37: If anyone has access to Undodb, this bug would be a great candidate for diagnosing the issue using this debugger. Short of that, I would instrument the preserve and release calls to log info, then filter out calls outside the area of interest. Walk backwards through the remainder to identify the failing sequence. My guess is that the preserve release calls are not balanced through every possible sequence. In other words, there’s a missing or extra call somewhere. chw added on 2024-11-28 00:35:14: Good question. In case of Menu and MenuEntry we have a clear hierarchy. Possibly this could be solved by a specialized preserve/release for MenuEntry which first pins its Menu on preserve and later unpins its Menu on release. I have left out the insert and delete operations here. These might be more complicated. Or in general OO like, i.e. the preserve and release operations are methods of the respective objects (function pointers). But this does not help for circular references at all. fvogel added on 2024-11-27 23:15:37: How in practice should we think about circular dangling of references regarding the Tcl_Preserve()/Tcl_Release() scheme? chw added on 2024-11-27 22:58:18: I see. But am uncertain since it does not totally account for the environment. Think of X11's asynchronicity, the load of the X server, network delays, and of course the phase of the moon (I'm kidding somewhat, OK?). The "update" command terminates when it does not see another event, right? And that makes it sensitive to the above mentioned problems, excluding the astronomic one. fvogel added on 2024-11-27 22:23:15: 20000 times: I have just used a for loop, with update in it. fvogel added on 2024-11-27 22:21:52: OK, how would you do that? chw added on 2024-11-27 22:21:24: BTW: Francois, how did you click 20000 times in reasonable time? (Hopefully with reasonable jitter?) Should this trick be adopted in the test infrastructure? chw added on 2024-11-27 22:12:01: Yes, thanks. Let us think about circular dangling of references regarding the Tcl_Preserve()/Tcl_Release() scheme nevertheless. It could be a sad sound of blues at other places, too. fvogel added on 2024-11-27 21:57:15: I tried my patch below (2024-11-26 21:15:59) and clicked the button 20000 times. I couldn't make the memory footprint of the tclsh process raise. Nevertheless, I think your patch [006481ff] is correct and probably safer. I had a look at other places in tkMenu.c where the menu entry struct is Tcl_Preserve()'d, there are only 2 such places: - case MENU_ENTRYCGET in MenuWidgetObjCmd: I don't even seen why mePtr needs to be Tcl_Preserve()'d when retrieving the configuration options of an entry, but it shouldn't hurt. - case MENU_ENTRYCONFIGURE in MenuWidgetObjCmd: OK, I see. I'd say: let's merge your patch, which fixes the present ticket (and test suite is happy), and let's not change other calls to Tcl_Preserve() on the menu entry since they are not demonstrably broken. Agreed? chw added on 2024-11-27 07:54:25: I'm unsure, see the strange comment in DestroyMenuInstance()'s loop over the MenuEntries, where Tcl_EventuallyFree() on them is called. More and more I believe we have a pretty circular logic nut to crack. And Tcl_Preserve() and Tcl_Release() aren't made for circles. fvogel added on 2024-11-27 06:53:03: I had thoughts at this, indeed. When menuPtr->tkwin is NULL I believe the menu resources are gone (freed) already. I thought that menu entries of that menu would be gone too then, isn't it the case? chw added on 2024-11-27 06:47:16: Francois, your patch looks very nice and clean. However, I wonder if it leaks resources of the MenuEntry if Menu's tkwin is gone, since the config options are not free'd in this case. fvogel added on 2024-11-26 21:15:59: Instead of your proposed patch, the following one also fixes the crash: Index: generic/tkMenu.c ================================================================== --- generic/tkMenu.c +++ generic/tkMenu.c @@ -1455,11 +1455,13 @@ TCL_GLOBAL_ONLY|TCL_TRACE_WRITES|TCL_TRACE_UNSETS, MenuVarProc, mePtr); } TkpDestroyMenuEntry(mePtr); TkMenuEntryFreeDrawOptions(mePtr); - Tk_FreeConfigOptions((char *) mePtr, mePtr->optionTable, menuPtr->tkwin); + if (menuPtr->tkwin) { + Tk_FreeConfigOptions((char*)mePtr, mePtr->optionTable, menuPtr->tkwin); + } ckfree(mePtr); } /* *--------------------------------------------------------------------------- chw added on 2024-11-26 08:05:00: Francois, this makes perfect sense. Now the fine question is how the MenuEntry memory should be managed in general. The commit [78ce1d6658] tried to ensure consistency since at various places the MenuEntry is Tcl_Preserve()'d. And thus should be disposed of by Tcl_EventuallyFree() always. As can be seen in my patch I tried to circumvent the need to Tcl_Preserve() entirely by picking all required Tcl_Objs out of the MenuEntry and manage them separately. fvogel added on 2024-11-26 07:18:27: My previous fossil bisection was on Windows since I could reproduce there. Fossil bisecting on Linux points the finger to commit [78ce1d6658], that intended to fix ticket [499c0467b3]. Do you think that commit was correct? chw added on 2024-11-24 19:22:49: But [fd7b2987] concerns Win32 only and Robert observes crashes on Linuxen. fvogel added on 2024-11-24 19:10:19: Fossil bisection points to [fd7b2987] as the first commit triggering the crash in core-8-6-branch. This commit was made to fix [2d3a81c0]. This needs further thoughts I guess. chw added on 2024-11-22 22:00:00: Because I quickly cobbled it together. There are other places where the menu entry struct is Tcl_Preserve()'d. Might need to be reviewed, too. But I'm uncertain. fvogel added on 2024-11-22 21:38:18: Committed in branch bug-8ce672d1a1, thanks. Why makes this only a POC in your view? chw added on 2024-11-22 20:18:34: Francois, please have a look into my attached POC fix. fvogel added on 2024-11-22 18:15:32: Quick stack trace obtained with Tcl/Tk current main branches: tcl9tk90.dll!Tk_GetFontFromObj(Tk_Window_ * tkwin, Tcl_Obj * objPtr) Ligne 1301 C tcl9tk90.dll!Tk_FreeFontFromObj(Tk_Window_ * tkwin, Tcl_Obj * objPtr) Ligne 1508 C tcl9tk90.dll!FreeResources(TkOption * optionPtr, Tcl_Obj * objPtr, void * internalPtr, Tk_Window_ * tkwin) Ligne 1873 C tcl9tk90.dll!Tk_FreeConfigOptions(void * recordPtr, Tk_OptionTable_ * optionTable, Tk_Window_ * tkwin) Ligne 1802 C tcl9tk90.dll!DestroyMenuEntry(void * memPtr) Ligne 1508 C tcl90.dll!Tcl_Release(void * clientData) Ligne 229 C tcl9tk90.dll!TkInvokeMenu(Tcl_Interp * interp, TkMenu * menuPtr, __int64 index) Ligne 1094 C tcl9tk90.dll!TkWinHandleMenuEvent(HWND__ * * dummy1139, unsigned int * pMessage, unsigned __int64 * pwParam, __int64 * plParam, __int64 * plResult) Ligne 1242 C tcl9tk90.dll!TkWinMenuProc(HWND__ * hwnd, unsigned int message, unsigned __int64 wParam, __int64 lParam) Ligne 1011 C [Code externe] tcl90.dll!TclpWaitForEvent(const Tcl_Time * timePtr) Ligne 550 C tcl90.dll!Tcl_WaitForEvent(const Tcl_Time * timePtr) Ligne 1351 C tcl90.dll!Tcl_DoOneEvent(int flags) Ligne 986 C tcl9tk90.dll!Tk_MainLoop() Ligne 2127 C tcl90.dll!Tcl_MainExW(__int64 argc, wchar_t * * argv, int(*)(Tcl_Interp *) appInitProc, Tcl_Interp * interp) Ligne 563 C tclsh90.exe!wmain(int argc, wchar_t * * argv) Ligne 144 C [Code externe] Crashes in Tk_GetFontFromObj() because tkwin is NULL. |
Attachments:
- tkMenu.patch [download] added by chw on 2024-11-22 20:18:54. [details]