Tk Source Code

View Ticket
Login
Ticket UUID: 8ce672d1a16826e9cdb001322a8b04e0083ebe7d
Title: Segfault with Tk >= 8.6.14 when using menu(button) with the -font option in a certain way
Type: Bug Version: 8.6.14
Submitter: robgcs Created on: 2024-11-22 16:43:01
Subsystem: 75. wish Assigned To: fvogel
Priority: 5 Medium Severity: Critical
Status: Closed Last Modified: 2024-12-04 09:18:29
Resolution: Fixed Closed By: oehhar
    Closed on: 2024-12-04 09:18:29
Description:
Hello, everybody

After upgrading to Tk 8.6.14, we noticed that our application crashes at a certain point with a memory access error (segfault). The error does not occur with Tk <= 8.6.13. We were able to create a small sample script that can be used to reproduce the error (see below).

Kind regards,

Robert


#!/usr/bin/env wish

# If you click on the menu button and select the entry, the program crashes
# with Tk >= 8.6.14.
#
# Tested under
#  openSUSE Leap 15.4, Tcl/Tk 8.6.12 (works)
#  openSUSE Leap 15.6, Tcl/Tk 8.6.12 (works)
#  openSUSE Leap 15.6, Tcl/Tk 8.6.13 (works)
#  openSUSE Leap 15.6, Tcl/Tk 8.6.14 (crashes)
#  openSUSE Leap 15.6, Tcl/Tk 8.6.15 (crashes)
#  openSUSE Tumbleweed, Tcl/Tk 8.6.15 (crashes)
#  Ubuntu 20.04, Tcl/Tk 8.6.10 (works)

proc createBut {w font} {
   destroy $w
   frame $w
   
   menubutton $w.mb -text $font -font $font -menu $w.mb.m
   menu $w.mb.m -tearoff 0
   set font [expr {$font eq "Courier" ? "Times" : "Courier"}]
   
   # Does not crash if you use Tk <= 8.6.13 or remove the -font option below
   $w.mb.m add command \
      -label $font \
      -font $font \
      -command [list createBut $w $font]
   
   pack $w.mb
   pack $w
}

createBut .but "Courier"
User Comments: oehhar added on 2024-12-04 09:18:29:

Great work, guys! Would it be worthwile to mention in the changes, that the issue was introduced in Tk 8.6.14? For me, it is always sensitive to answer the question "am I affected"?

Thanks, Harald


fvogel added on 2024-12-03 03:57:46:
Hmmm... OK, even if the test suite looks fine I can't actually demonstrate that commandPtr is always valid when it is used by Tcl_EvalObjEx with mePtr being Tcl_Release()'d a few lines before that call. This makes your patch superior indeed. I'm merging the bugfix branch with your patch and am closing this ticket. Thanks!

chw added on 2024-12-02 22:08:45:
Are you sure that commandPtr is always every time valid
when the containing menu entry is Tcl_Release()'d beforehand?

fvogel added on 2024-12-02 21:14:31:

Well, I had a try at instrumenting the calls to Tcl_Preserve()/Tcl_Release() related to the menu command. Here is an output on Windows, starting at the button click on the menu:

<set up of the menu and entry; up to this point, the calls to Tcl_Release() match the preivous Tcl_Preserve() calls>
WM_COMMAND,     Tcl_Preserve menu  0000018141CEFB00
TkInvokeMenu,   Tcl_Preserve entry 0000018141D45160
TkInvokeMenu,   Tcl_EvalObjEx
TkDestroyMenu,  Tcl_Preserve menu  0000018141CEFB00
TkDestroyMenu,  Tcl_Release  menu  0000018141CEFB00
MenuWidgObjCmd, Tcl_Preserve menu  0000018141CF1100
MenuWidgObjCmd, MENU_ADD
MenuWidgObjCmd, Tcl_Release  menu  0000018141CF1100 (done)
TkInvokeMenu,   Tcl_EvalObjEx returned
TkInvokeMenu,   Tcl_Release  entry 0000018141D45160
<crash!!!>

So on clicking the menu entry D45160, TkWinHandleMenuEvent() is called with WM_COMMAND. menuPtr CEFB00 is Tcl_Preserve()'d.

The menu invoke command starts running and preserves the menu entry D45160.

The -command script attached to the entry gets run (Tcl_EvalObjEx()).

This bound script destroys the CEFB00 menu (we have matching Preserve/Release calls of CEFB00 here). At this point the tkwin attached to the menu gets Tcl_EventuallyFree()'d.

The bound script creates a new menu CF1100 and adds a menu entry to it.

The bound script returns (Tcl_EvalObjEx() returned).

And now the menu entry D45160 gets Tcl_Release()'d. But this crashes because tkwin is NULL.

Conclusion #1: I don't see any missing or duplicate calls to Tcl_Preserve()/Tcl_Release() here. Only a regular call to Tcl_Release() that crashes because in the release process something (the tkwin) unexpectedly disappeared from under its feet.

Conclusion #2: While chw's patch works, another possible patch is simply:

Index: generic/tkMenu.c
==================================================================
--- generic/tkMenu.c
+++ generic/tkMenu.c
@@ -1048,10 +1048,11 @@
 		TCL_GLOBAL_ONLY|TCL_LEAVE_ERR_MSG) == NULL) {
 	    result = TCL_ERROR;
 	}
 	Tcl_DecrRefCount(valuePtr);
     }
+    Tcl_Release(mePtr);
 
     /*
      * We check numEntries in addition to whether the menu entry has a command
      * because that goes to zero if the menu gets deleted (e.g., during
      * command evaluation).
@@ -1063,11 +1064,10 @@
 
 	Tcl_IncrRefCount(commandPtr);
 	result = Tcl_EvalObjEx(interp, commandPtr, TCL_EVAL_GLOBAL);
 	Tcl_DecrRefCount(commandPtr);
     }
-    Tcl_Release(mePtr);
 
   done:
     return result;
 }

I prefer this patch over chw's proposal [006481ff]. Opinions?


fvogel added on 2024-12-02 06:59:13:

This sounds kind of a quest. Can we agree on:

- merge branch bug-8ce672d1a1 now, since chw's patch there gets included in 8.6.16. Here a real bug was demonstrated and fixed.
- keep this ticket open (or alternatively open a dedicated new ticket) and let the quest happen in a later timeframe (can anyone demonstrate a remaining bug?)


griffin added on 2024-11-28 00:35:37:

If anyone has access to Undodb, this bug would be a great candidate for diagnosing the issue using this debugger.

Short of that, I would instrument the preserve and release calls to log info, then filter out calls outside the area of interest. Walk backwards through the remainder to identify the failing sequence.

My guess is that the preserve release calls are not balanced through every possible sequence. In other words, there’s a missing or extra call somewhere.


chw added on 2024-11-28 00:35:14:
Good question. In case of Menu and MenuEntry
we have a clear hierarchy. Possibly this could be
solved by a specialized preserve/release for
MenuEntry which first pins its Menu on preserve
and later unpins its Menu on release. I have
left out the insert and delete operations here.
These might be more complicated.
Or in general OO like, i.e. the preserve and
release operations are methods of the respective
objects (function pointers).

But this does not help for circular references
at all.

fvogel added on 2024-11-27 23:15:37:
How in practice should we think about circular dangling of
references regarding the Tcl_Preserve()/Tcl_Release()
scheme?

chw added on 2024-11-27 22:58:18:
I see. But am uncertain since it does not totally
account for the environment. Think of X11's asynchronicity,
the load of the X server, network delays, and of course
the phase of the moon (I'm kidding somewhat, OK?).
The "update" command terminates when it does not see
another event, right? And that makes it sensitive to the
above mentioned problems, excluding the astronomic one.

fvogel added on 2024-11-27 22:23:15:
20000 times: I have just used a for loop, with update in it.

fvogel added on 2024-11-27 22:21:52:
OK, how would you do that?

chw added on 2024-11-27 22:21:24:
BTW: Francois, how did you click 20000 times in reasonable time?
(Hopefully with reasonable jitter?) Should this trick be adopted
in the test infrastructure?

chw added on 2024-11-27 22:12:01:
Yes, thanks. Let us think about circular dangling of
references regarding the Tcl_Preserve()/Tcl_Release()
scheme nevertheless. It could be a sad sound of blues
at other places, too.

fvogel added on 2024-11-27 21:57:15:

I tried my patch below (2024-11-26 21:15:59) and clicked the button 20000 times. I couldn't make the memory footprint of the tclsh process raise.

Nevertheless, I think your patch [006481ff] is correct and probably safer.

I had a look at other places in tkMenu.c where the menu entry struct is Tcl_Preserve()'d, there are only 2 such places:

- case MENU_ENTRYCGET in MenuWidgetObjCmd: I don't even seen why mePtr needs to be Tcl_Preserve()'d when retrieving the configuration options of an entry, but it shouldn't hurt.
- case MENU_ENTRYCONFIGURE in MenuWidgetObjCmd: OK, I see.

I'd say: let's merge your patch, which fixes the present ticket (and test suite is happy), and let's not change other calls to Tcl_Preserve() on the menu entry since they are not demonstrably broken.

Agreed?


chw added on 2024-11-27 07:54:25:
I'm unsure, see the strange comment in DestroyMenuInstance()'s loop over the
MenuEntries, where Tcl_EventuallyFree() on them is called.

More and more I believe we have a pretty circular logic nut to crack. And
Tcl_Preserve() and Tcl_Release() aren't made for circles.

fvogel added on 2024-11-27 06:53:03:
I had thoughts at this, indeed. When menuPtr->tkwin is NULL I believe the menu resources are gone (freed) already. I thought that menu entries of that menu would be gone too then, isn't it the case?

chw added on 2024-11-27 06:47:16:
Francois, your patch looks very nice and clean.

However, I wonder if it leaks resources of the MenuEntry
if Menu's tkwin is gone, since the config options are
not free'd in this case.

fvogel added on 2024-11-26 21:15:59:

Instead of your proposed patch, the following one also fixes the crash:

Index: generic/tkMenu.c
==================================================================
--- generic/tkMenu.c
+++ generic/tkMenu.c
@@ -1455,11 +1455,13 @@
 		TCL_GLOBAL_ONLY|TCL_TRACE_WRITES|TCL_TRACE_UNSETS,
 		MenuVarProc, mePtr);
     }
     TkpDestroyMenuEntry(mePtr);
     TkMenuEntryFreeDrawOptions(mePtr);
-    Tk_FreeConfigOptions((char *) mePtr, mePtr->optionTable, menuPtr->tkwin);
+    if (menuPtr->tkwin) {
+	Tk_FreeConfigOptions((char*)mePtr, mePtr->optionTable, menuPtr->tkwin);
+    }
     ckfree(mePtr);
 }
 
 /*
  *---------------------------------------------------------------------------


chw added on 2024-11-26 08:05:00:
Francois, this makes perfect sense. Now the fine question is how
the MenuEntry memory should be managed in general. The commit
[78ce1d6658] tried to ensure consistency since at various places
the MenuEntry is Tcl_Preserve()'d. And thus should be disposed of
by Tcl_EventuallyFree() always. As can be seen in my patch I tried
to circumvent the need to Tcl_Preserve() entirely by picking all
required Tcl_Objs out of the MenuEntry and manage them separately.

fvogel added on 2024-11-26 07:18:27:

My previous fossil bisection was on Windows since I could reproduce there.

Fossil bisecting on Linux points the finger to commit [78ce1d6658], that intended to fix ticket [499c0467b3]. Do you think that commit was correct?


chw added on 2024-11-24 19:22:49:
But [fd7b2987] concerns Win32 only and Robert observes crashes on Linuxen.

fvogel added on 2024-11-24 19:10:19:

Fossil bisection points to [fd7b2987] as the first commit triggering the crash in core-8-6-branch. This commit was made to fix [2d3a81c0].

This needs further thoughts I guess.


chw added on 2024-11-22 22:00:00:
Because I quickly cobbled it together. There are other places
where the menu entry struct is Tcl_Preserve()'d. Might need to
be reviewed, too. But I'm uncertain.

fvogel added on 2024-11-22 21:38:18:

Committed in branch bug-8ce672d1a1, thanks. Why makes this only a POC in your view?


chw added on 2024-11-22 20:18:34:
Francois, please have a look into my attached POC fix.

fvogel added on 2024-11-22 18:15:32:

Quick stack trace obtained with Tcl/Tk current main branches:

 	tcl9tk90.dll!Tk_GetFontFromObj(Tk_Window_ * tkwin, Tcl_Obj * objPtr) Ligne 1301	C
 	tcl9tk90.dll!Tk_FreeFontFromObj(Tk_Window_ * tkwin, Tcl_Obj * objPtr) Ligne 1508	C
 	tcl9tk90.dll!FreeResources(TkOption * optionPtr, Tcl_Obj * objPtr, void * internalPtr, Tk_Window_ * tkwin) Ligne 1873	C
 	tcl9tk90.dll!Tk_FreeConfigOptions(void * recordPtr, Tk_OptionTable_ * optionTable, Tk_Window_ * tkwin) Ligne 1802	C
	tcl9tk90.dll!DestroyMenuEntry(void * memPtr) Ligne 1508	C
 	tcl90.dll!Tcl_Release(void * clientData) Ligne 229	C
 	tcl9tk90.dll!TkInvokeMenu(Tcl_Interp * interp, TkMenu * menuPtr, __int64 index) Ligne 1094	C
 	tcl9tk90.dll!TkWinHandleMenuEvent(HWND__ * * dummy1139, unsigned int * pMessage, unsigned __int64 * pwParam, __int64 * plParam, __int64 * plResult) Ligne 1242	C
 	tcl9tk90.dll!TkWinMenuProc(HWND__ * hwnd, unsigned int message, unsigned __int64 wParam, __int64 lParam) Ligne 1011	C
 	[Code externe]	
 	tcl90.dll!TclpWaitForEvent(const Tcl_Time * timePtr) Ligne 550	C
 	tcl90.dll!Tcl_WaitForEvent(const Tcl_Time * timePtr) Ligne 1351	C
 	tcl90.dll!Tcl_DoOneEvent(int flags) Ligne 986	C
 	tcl9tk90.dll!Tk_MainLoop() Ligne 2127	C
 	tcl90.dll!Tcl_MainExW(__int64 argc, wchar_t * * argv, int(*)(Tcl_Interp *) appInitProc, Tcl_Interp * interp) Ligne 563	C
 	tclsh90.exe!wmain(int argc, wchar_t * * argv) Ligne 144	C
 	[Code externe]	

Crashes in Tk_GetFontFromObj() because tkwin is NULL.


Attachments: