Ticket UUID: | f52986c6989b8a43a412efb66374b3940e366bac | |||
Title: | SIGABRT from Tk_DeleteErrorHandler() | |||
Type: | Bug | Version: | 9.0.0 | |
Submitter: | sbron | Created on: | 2024-10-02 09:30:22 | |
Subsystem: | 46. Unix Fonts | Assigned To: | fvogel | |
Priority: | 5 Medium | Severity: | Severe | |
Status: | Closed | Last Modified: | 2024-10-11 11:52:13 | |
Resolution: | Fixed | Closed By: | jan.nijtmans | |
Closed on: | 2024-10-11 11:52:13 | |||
Description: |
Running TkChat with Tcl/Tk 9.0.0 sometimes aborts when loading many lines of history:
Looking at Tk_DeleteErrorHandler(), it seems that the error comes from some kind of garbage collection and may not be related to Tk_MeasureChars() at all. Both the core file and the binary are quite big so I'm not attaching them to the ticket. But I'll be happy to fire up gdb and dump any information that may be helpful to investigate this problem. | |||
User Comments: |
jan.nijtmans added on 2024-10-11 11:52:13:
Like [335c92cf9d27ab86|this]? chw added on 2024-10-11 11:03:06: Would you mind to re-add the comment of the originally innocent looking lines again? Although adding no real beef, it provides some guidance for the casual reader at least. jan.nijtmans added on 2024-10-11 10:11:51: Fixed [7a599109d4fc8ea5|here], and also in 8.6/8.7 Hopefully the abort doesn't show up any more, otherwise feel free to re-open this ticket! sbron added on 2024-10-11 07:12:23: I haven't experienced any aborts of TkChat anymore after I added the patch to Tk_DeleteErrorHandler(). In my test I inserted the zeroing out of errorPtr->errorProc before the block of code that first increments and then checks dispPtr->deleteCount, rather than in between. But that shouldn't make any difference to the functionality, possibly only to the ability of the compiler to optimize the code. jan.nijtmans added on 2024-10-10 22:00:33: Patch looks good to me (FWIW) fvogel added on 2024-10-07 20:51:06: Committed in branch bug-f52986c698. sbron> I still experienced a crash of TkChat after making only the change to sbron> Tk_MeasureChars().This makes sense. It is consistent with: fvogel> I fail in identifying why this change would prevent Xsync() from fvogel> SIGABRTing during Tk_DeleteErrorHandler().The second patch looks more like a promising candidate to me. sbron added on 2024-10-07 08:48:20: I still experienced a crash of TkChat after making only the change to Tk_MeasureChars(). I have now also added the Tk_DeleteErrorHandler() change. A couple of restarts of TkChat worked fine. But, as the problem only happens every now and then, I'm unable to conclude if that fixes the problem. I will keep monitoring in the next days. chw added on 2024-10-06 20:21:48: Would you like to consider some innocent looking lines in Tk_DeleteErrorHandler()? https://www.androwish.org/home/file?ci=tip&name=jni/sdl2tk/generic/tkError.c&ln=169-173 They will prevent possible stack bombs. fvogel added on 2024-10-06 19:22:14: Thank you for this proposal Christian! I see the reason for the change you are proposing: don't wait for the next iteration of the while (numBytes > 0) loop to set extents.xOff to zero in case XftTextExtents32() triggered an error. This change looks OK to me since the right value of extents.xOff will then be used afterwards. However, I fail in identifying why this change would prevent Xsync() from SIGABRTing during Tk_DeleteErrorHandler(). Could you please explain? Anyway, despite many many tries, I couldn't check whether this change fixes the reported issue or not. I have lots of difficulties in reproducing the problem (without your fix). Without a reproducible scheme, I'm afraid we cannot demonstrate that the fix works. chw added on 2024-10-05 16:53:39: Would you like to test this inconspicuous logic change in Tk_MeasureChars()? Old: if (!errorFlag) { LOCK; XftTextExtents32(fontPtr->display, ftFont, &c, 1, &extents); UNLOCK; } else { extents.xOff = 0; errorFlag = 0; } New: if (!errorFlag) { LOCK; XftTextExtents32(fontPtr->display, ftFont, &c, 1, &extents); UNLOCK; } if (errorFlag) { extents.xOff = 0; errorFlag = 0; } fvogel added on 2024-10-05 13:14:10: The XSync() line triggering the SIGABRT has been introduced in [3233521135] (side note: this means the issue should be reproducible with core-8-6-branch as well, but I didn't check in practice). @Christian Werner, any thoughts perhaps? sbron added on 2024-10-05 13:06:25: It probably still has the same root cause. If I read the code correctly, Tk_DeleteErrorHandler() collects 10 handlers before it cleans them all up in one go. So the actual problem may have happened earlier than the current call to Tk_MeasureChars(). Maybe doing the cleanup every time provides a better indication of where the problem really happens. fvogel added on 2024-10-05 11:54:24: Not the same sack trace, but still from Tk_MeasureChars. fvogel added on 2024-10-05 11:52:55: By increasing the number of days log (change -13 to, say, -23 in tkchat.tcl:485), I could reproduce an abort few times. As mentioned in the original report, this is not reproducible every time. It is even not so eaesy to trigger, one needs several (identical) attempts. Smells a lot like a race condition. Running in gdb I could get another SIGABRT error, but not the same as the one originally reported. See attached. fvogel added on 2024-10-05 09:03:02: Looking at the code I'm wondering if this could be linked to [1185640fff]. sbron added on 2024-10-05 08:24:45: It doesn't abort for me today either. I guess it may depend on some content in the history. I'll let you know when it happens again. fvogel added on 2024-10-05 08:13:04: Thank you for the recipe. I could follow it and create a wrapped tkchat executable on Linux Debian 11. However I'm afraid I cannot reproduce the reported problem. I ran this executable several times, each time loading as much history as available, but I couldn't get any abort. sbron added on 2024-10-03 07:47:26: These are the commands I used to build a wrapped executable for tkchat, starting from an empty directory:
sbron added on 2024-10-02 20:38:44: Technically, yes. But I suspect that X11 aborts because it is getting some invalid data from Tk. TkChat is part of tclapps. I just basically ripped out all the starkit stuff from tkchat.vfs/main.tcl. I then zipped it up and added it to a static wish 9.0. I will provide a more detailed procedure tomorrow, if still needed. fvogel added on 2024-10-02 20:04:43: Isn't this an abort in the X11 guts (not Tk)? Can you provide a recipe about how to "run TkChat with Tcl/Tk 9.0.0"? That way I could try to reproduce. |