Tk Source Code

View Ticket
Login
Ticket UUID: 22349fc78a7354435595369924179e8fabafb484
Title: Incorrect crossing events upon destruction of the pointer window under MS Windows, remaining issues on MacOS
Type: Bug Version: 8.6.13
Submitter: erikleunissen Created on: 2024-02-18 13:40:55
Subsystem: 69. Events Assigned To: marc_culler
Priority: 5 Medium Severity: Important
Status: Open Last Modified: 2024-05-06 08:02:36
Resolution: None Closed By: nobody
    Closed on:
Description:
Despite the efforts in tickets [9e1312f32c] and [b1d115fa60], bugs regarding
crossing events continue to show up under MS Windows when the pointer window
is being destroyed. Please see the tests described in section B. in order
to reproduce the bugs.

These additional bugs were the incentive to revisit the older tickets, and dig
further in order to understand what's really going on. This revealed that the
fix, developed in tickets [9e1312f32c] and [b1d115fa60], corrects the specific
misbehaviour as reported those tickets, but that it doesn't address the true
cause of the issue(s).

This post explains the true cause of the issue, provides tests and a fix
that addresses the true cause of the issue.


A. Explanation of the issue
===========================
The windowing system provided by MS Windows doesn't generate crossing events,
as opposed to X11. To compensate for that, Tk employs a self-built mechanism
that generates crossing events where needed (in tkPointer.c and
win/tkWinPointer.c). However, when generating crossing events, that mechanism
doesn't correctly consider the case where the pointer window is destroyed.
It handles that case as a regular mouse pointer transition between window A
and window B, where it ends up using something inappropriate for window A
instead of the pointer window (in the process of being destroyed). This
shortcoming leads to missing crossing events for intermediate windows in
the mouse pointer transition, and to crossing events with an incorrect
detail field. For the relevant code section, please see:

    [https://core.tcl-lang.org/tk/file?udc=1&ln=on&ci=a44363e604dadc30&name=generic%2FtkPointer.c&ln=168,189]

Tickets [9e1312f32c] and [b1d115fa60] have addressed particular expressions
of this shortcoming. The misbehaviours reported in those tickets were
difficult to see through because the shortcoming explained above interacts
with another, separate issue reported in [47d4f29159]. Their interplay
resulted in the observed misbehaviour. The details of that interplay are
further explained in a separate follow-up post to ticket [9e1312f32c].


B. Tests
========
The attached test script "crossing_events_upon_pw_destroy.test" holds ten
test cases that exercise the generation of crossing events, with specific
focus on the detail field of the crossing event.
Five of them (event-9.12, event-9.13, event-9.14, event-9.18 and event-9.19)
produce incorrect results under MS Windows, while producing correct results
under Linux:

- these tests PASS under Linux/X11 (10 different window managers, including
  no window manager at all, i.e. just the raw X server).
- these tests FAIL under MS Windows, when run without the patch in the current
  ticket (using Tk8.6.13 or Tk8.7a5).
- these tests PASS under MS Windows if the attached patch is applied.

The other tests in the test script are used to ensure correct functioning
of the attached patch but don't exercise current bugs.

Notes
-----
- the ten tests intend to cover all relevant transitions in the window
  hierarchy.
- tests event-9.11 and event-9.13 show overlap with existing tests
  (event-9, event-9.1 and event-9.2), except that they additionally test
  for a correct detail field. Whether the new tests can be appended or should
  replace existing tests, needs consideration.
  Note aside: fixing the issue in ticket [47d4f29159] invalidates test
  event-9.
  Therefore, the patch in that ticket removes it.
- tests event-9.11, event-9.16 and event-9.17 are affected by the issue
  described in [47d4f29159]. Therefore, the expected result of these tests has
  been made to depend on whether Tk invokes binding scripts for crossing
  events with detail NotifyInferior or not (i.e. whether the patch in
  [47d4f29159] was applied).
- tests don't wait for fixed amounts of time (e.g. for the windowing system
  to complete communications with Tk). Instead, they wait for specific events
  before proceeding (e.g. the servicing of windowing system notifications).
  This evades the problem of "how much waiting time is enough on an unknown
  end user's platform?", and it speeds up test execution considerably. On
  the other hand, it also requires more attention to details (see for example
  the explanation to test event-9.15). It's also a style that differs from
  the current style of waiting in the test script event.test.

For all these reasons, the tests in the test script have not been included
in the patch below. Tests will need to be handled separately, depending on
review and possibly consultation.


C. Patch
========
The attached patch "crossing_events_upon_pw_destroy.patch" fixes the issue by
addressing its true cause. The patch is based on Tk release 8.6.13, so it
replaces the shortcut fix developed with ticket [9e1312f32c]. Note aside:
the patch doesn't affect the change made by ticket [b1d115fa60] because
that change doesn't interfere with the true cause of the current issue,
and is an improvement in itself.


D. Conclusion
=============
- This post describes the cause of the reported issue. It's also the true
  cause for the misbehaviour reported with tickets [9e1312f32c] and
  [b1d115fa60].
- The attached patch holds a fix that addresses the cause of the issue.

Erik Leunissen.
User Comments: oehhar added on 2024-05-06 08:02:36:

As the title states "MS Windows", it might be a good idea to change it to make clear, that there are remaining issues for MacOS.


fvogel added on 2024-05-05 22:03:52:
Assigning to Marc Culler regarding the remaining issues specific to macOS (several private emails exchanged).

erikleunissen added on 2024-04-14 17:12:19:
Thanks for having checked this once more.

I'm very relieved!

It resolves my fundamental cognitive dissonance about the hang. (As a collateral effect, it speeds up this instance of waiting considerably).

fvogel added on 2024-04-14 15:21:14:
In [6bdef79b62] I have restored the original code. Well, almost. The root issue with using "." in the test suite is that this window is already visible, thus [tkwait visibility .] would wait for it forever in the test suite. Now that we're using another window we can call [tkwait visibility $w] (with $w ne ".") instead of calling _pause. This change is not paramount to me, but why not.

fvogel added on 2024-04-14 15:14:15:

I cannot reproduce the hang anymore either at [116624c2].

At [b1256d26] (with the _pause 200 --> tkwait visibility change on top of it) however it hangs, indeed because [tkwait visibility .] waits forever since the . window is already visible.


erikleunissen added on 2024-04-11 10:13:14:
Not that it changes anything about my point of view, but the proposed multi-platform alternative can be simpler:

    if {! [winfo ismapped $w]} {
        tkwait visibilty $w
        update
    }

erikleunissen added on 2024-04-11 10:03:43:
OK. Thanks for explaining these remaining matters.

With respect to a hang in:
   
        tkwait visibility $w
        update; # service remaining screen drawing events (e.g. <Expose>)

I cannot reproduce the hang, not with my original test script (of course),
and not with the current test script as integrated in the Tk test suite,
where I replaced the line:

    _pause 200; # service remaining screen drawing events (e.g. <Expose>)

This is all under MS Windows 7. I would be very much surprised if this varies
with the version of MS Windows.

For my personal interest, I would like to sort this out, but I can't
because I have no other version of MS Windows to execute the test file with.

In the interest of Tk: I leave it up to you whether you find worth while
to look into this further and reproduce the hang once more. I also don't
want to disturb the current state of affairs where the tests have already
been tuned to pass with Github, and I'm OK with leaving the matter as it is.
It's just that I find it dissatisfying not to understand why I can't
reproduce the hang.

A hang probably means that the <Visibility> event has already been serviced,
and that the window $w is already mapped before the call to "tkwait visibility".
Using that, I can think of a multi-platform alternative:

	# service any remaining screen drawing events
	if {[winfo ismapped $w]} {
		_pause 200; 
	} else {
        tkwait visibility $w
        update
	}

However, I find this alternative very ugly/cumbersome. I first would like to
know whether the hang depends on the version of MS Windows, before pursuing
this multi-platform alternative. For this reason also, I'm OK to just let
things as they are, unless you are motivated to pursue this.

fvogel added on 2024-04-10 19:25:27:

Your original code (tkwait visibility . ; update) lets tests such as event-9.11 hang on Windows (in the test suite I mean, perhaps not in your test script I don't remember). From this observation I just aligned your test code with how other tests in event.test are doing and have replaced your visibility statements by _pause 200 as elsewhere in the test file.

This indeed does not guarantee that the window is visible, although 200 ms is enough to let it work in practice (no unstable tests results seen so far). If you can propose something better (and multi-platform), just say so and I'll happily commit your proposal.

The choice of the value 200 ms was (short answer) a copy/paste. (Longer answer) It was governed by which value in other tests was large enough to produce the required results and small enough to not slow the tests down more than needed. No consideration regarding MOUSE_TIMER_INTERVAL was made. I have just tried on Windows with _pause 500 everywhere instead of _pause 200 and the tests do pass identically (as I would have expected, it MUST be so otherwise the test does not demonstrate anything. The value 200 ms was not chosen to prevent further calls to Tk_PointerEvent()).


erikleunissen added on 2024-04-10 14:41:11:
Since the tests for this ticket were integrated into the Tk test suite,
I didn't have a close look to them. (That's merely because my drive to do so
is inversely proportional to my confidence that you did this accurately,
François.) Well, I did that now, and everything looks fine to me, except for the
following two aspects where I have a small doubt/question:

A. Regarding this line:

        [https://core.tcl-lang.org/tk/file?ci=e3db006d1160463e&name=tests/event.test&ln=913]
    
    In my original test script the code reads:
    
        tkwait visibility .
        update; # service remaining screen drawing events (e.g. <Expose>)

    Now it is:

        _pause 200; # service remaining screen drawing events (e.g. <Expose>)

    a. Was the removal of the call to "tkwait visibility" on purpose? It makes that
       we can't be sure that the window is visible, e.g. if the end user's system
       is very busy processing other stuff during the 200 ms that the Tk process
       is waiting). This makes me uncomfortable.
    b. If the removal of the call to "tkwait visibility" was on purpose, then
       the comment is somewhat misleading, especially the word "remaining".
       What's happening during the [_pause 200] is much more than just the
       servicing of "remaining" drawing events. A better comment would be:
   
          _pause 200; # let the drawing of the window to the screen occur to its completion

       (or something equivalent).

B. Regarding the 200 ms that _pause is waiting:
   It didn't go unnoticed that 200 ms is less than the current value for MOUSE_TIMER_INTERVAL
   (250 ms). The choice makes that there is no time for the sequence MouseTimerProc() -> Tk_PointerEvent()
   to be run twice. (This is on MS Windows, I don't know if this also holds for macOS/aqua).
   
   Was this consideration indeed the reason for choosing 200 ms? If so, wouldn't it wise to
   add this as an explanation somewhere?

fvogel added on 2024-04-07 10:11:51:

I'm dropping my findings regarding the macOS aqua case here so that they do not get lost.

The test case I consider is event-9.11.

  1. When event-9.11 destroys .one.f1.f2, XdestroyWindow() is called for .one.f1.f2, which calls TkPointerDeadWindow() for it. Windows and macOS aqua behave identically here.
  1. In TkPointerDeadWindow(), the condition if (winPtr == tsdPtr->lastWinPtr) is true on Windows but not on macOS aqua. This means that tsdPtr->lastWinPtr is not correct on macOS aqua. Indeed, tsdPtr->lastWinPtr is supposed to refer to the window that was most recently containing the pointer, which should be .one.f1.f2. So macOS aqua does not maintain tsdPtr->lastWinPtr correctly.
  1. lastWinPtr is only set in GenerateEnterLeave(), which is only called from Tk_UpdatePointer(). So this confirms that there must be a missing call to Tk_UpdatePointer().
  1. Instrumentation of event-9.11 tells that the two platforms behave differently during the _pause 200 in the setup phase. Windows calls Tk_UpdatePointer() and then GenerateEnterLeave() on .one.f1.f2 while macOS aqua does not.
  1. Instrumenting tkProcessMouseEvent() (#define TK_MAC_DEBUG_EVENTS 1) shows that on macOS aqua there are four events serviced during that _pause 200. First two NSMouseExited events, and then two NSMouseEntered events. All four cases call return theEvent ; tkProcessMouseEvent() does not call Tk_UpdatePointer() later in its code. Important enough is the fact the window getting these four events is .one while it should be .one.f1.f2 (as on Windows).
  1. Commit [b3c05ed6] is a proof of concept letting event-9.11 pass on macOS aqua. This shows there are missing NSevents (or missing processing of these events by the macOS aqua code).


fvogel added on 2024-03-23 17:00:42:

OK, committed. Thanks.

Yes, I also had thought about leaving the macOS situation as it is and constrain the tests. "notAqua" would be the right constraint. The tests pass on the mac with XQuartz (which is expected). I'm not yet fully ready to give up yet.


erikleunissen added on 2024-03-23 15:42:25:
The change in commit [60fbc1fa]] wasn't good enough after all. This is what's needed, based on [b0172da5] the current tip of the bug-fix branch (sorry for the piecewise tactics):

--- event.test.1        2024-03-23 16:30:51.000000000 +0100
+++ event.test.2        2024-03-23 16:34:10.693688779 +0100
@@ -880,13 +880,15 @@
     set _windowEvent($counter) 1
     set savedBinding [bind $w $event]
     bind $w $event [list +waitForWindowEvent.signal $counter]
-    after $timeout [list set _windowEvent($counter) -1]
+    set afterID [after $timeout [list set _windowEvent($counter) -1]]
     vwait _windowEvent($counter)
     set late [expr {$_windowEvent($counter) == -1}]
     bind $w $event $savedBinding
     unset _windowEvent($counter)
     if {$late} {
        puts "waiting for $event event on $w timed out (> $timeout ms)"
+    } else {
+       after cancel $afterID
     }
 }
 proc waitForWindowEvent.signal {counter} {

erikleunissen added on 2024-03-23 15:24:23:
If macOS/aqua is broken in this area, and the resources fail to fix it,
then what about constraining the tests that don't pass on macOS/aqua,
and proceed with the improvements so that MS Windows can profit from them?

The tcltest docs mention a constraint "macCrash". However, the word "crash"
is inappropriate. There is a "unixOrWin", which seems a bit skew to the
purpose, but might be sufficient. What's needed is a "failsOnMac" or "failsOnMacAqua" constraint, but that isn't provided by tcltest.

By the way: do the tests also fail with mac/XQuartz?

fvogel added on 2024-03-21 21:20:42:

After spending some time on macOS, I believe that the mechanism generating crossing events on Windows simply does not exist on the mac. IOW what Tk implements for Windows in tkPointer.c and win/tkWinPointer.c, expecially Tk_PointerEvent does not seem to exist for the mac. If so, fixing this does not seem to be within my reach.

Or I didn't find this mechanism.

Or it's perhaps not needed, but in this case how is the whole thing supposed to work...?


fvogel added on 2024-03-17 15:09:54:

Right, now done in [69c3dfd1].

About the segfault: follow-up to [fdc0ed342d] for a reproducible script, a short explanation of what's happening, and the fix.

What remains to do for the present ticket is to understand why almost all new tests fail on macOS. I'll have to spend some time on this...


erikleunissen added on 2024-03-17 12:42:08:
correction: toplevel .row. -> toplevel .one

erikleunissen added on 2024-03-17 12:38:58:
By enlarging the windows in [8e55f77c], the mouse pointer destination under KDE/Plasma is right inside the wm border for toplevel .row. in test event-9.15.

This is uncomfortably close to incorrect. I suggest to change (in test event-9.15):

event generate .two <Motion> -warp 1 -x 250 -y 250

    to:

event generate .two <Motion> -warp 1 -x 275 -y 275

erikleunissen added on 2024-03-15 11:00:01:
The segfault occurs if one or both of the following tests are executed before test event-2.1(keypress):

event-1.1
event-1.2

Executing other tests in event.test doesn't seem to matter.

I think it's best that we do not let this bug clutter the present ticket.

erikleunissen added on 2024-03-15 10:25:48:
I've just removed all tests for the present ticket (event-9.11 through event-9.20), including the specific utility procs for these tests from event.test (copy of checkin [60fbc1fa4a]).

Result: the segfault prevails immediately after test listbox-30.1

erikleunissen added on 2024-03-15 09:10:20:
OK. Now reproduced with [60fbc1fa4a] and your invocation:

    xvfb-run --auto-servernum make test TESTFLAGS="-file \"event.test font.test listbox.test\" -verbose beps"

obtaining the same back trace. Here's the result of "bt full":

(gdb) bt full
#0  0x00007fc4e5891944 in FindDisplayFocusInfo (mainPtr=0x207373656c6e7520, dispPtr=0x646e697720656d6f)
    at /home/erik/priv/Develop/Tickets/crossingEventsUponPwDestroy/SOURCES/tk-60fbc1fa/unix/../generic/tkFocus.c:1018
        displayFocusPtr = 0x1faedb0
#1  0x00007fc4e5891328 in TkSetFocusWin (winPtr=0x287d9d0, force=1)
    at /home/erik/priv/Develop/Tickets/crossingEventsUponPwDestroy/SOURCES/tk-60fbc1fa/unix/../generic/tkFocus.c:654
        displayFocusPtr2 = 0x1e2f610
        focusPtr = 0x22becf0
        tlFocusPtr = 0x238a240
        displayFocusPtr = 0x1faedb0
        topLevelPtr = 0x1c14c80
        allMapped = 1
        serial = 0
#2  0x00007fc4e5890ad3 in Tk_FocusObjCmd (clientData=0x1c14c80, interp=0x1bd1990, objc=3, objv=0x1bd6c10)
    at /home/erik/priv/Develop/Tickets/crossingEventsUponPwDestroy/SOURCES/tk-60fbc1fa/unix/../generic/tkFocus.c:199
        focusOptions = {0x7fc4e59b3890 "-displayof", 0x7fc4e59b389b "-force", 0x7fc4e59b38a2 "-lastfor", 0x0}
        tkwin = 0x1c14c80
        winPtr = 0x1c14c80
        newPtr = 0x287d9d0
        topLevelPtr = 0x0
        tlFocusPtr = 0x1bd1990
        windowName = 0x1e2f610 ".l"
        index = 1
#3  0x00007fc4e5492bc6 in TclNRRunCallbacks ()
   from /home/erik/priv/Develop/Tickets/crossingEventsUponPwDestroy/BUILD/x86_64-linux/tcl-60fbc1fa/libtcl8.6.so
No symbol table info available.
#4  0x00007fc4e549501a in TclEvalEx ()
   from /home/erik/priv/Develop/Tickets/crossingEventsUponPwDestroy/BUILD/x86_64-linux/tcl-60fbc1fa/libtcl8.6.so
No symbol table info available.
#5  0x00007fc4e55641b8 in Tcl_FSEvalFileEx ()
   from /home/erik/priv/Develop/Tickets/crossingEventsUponPwDestroy/BUILD/x86_64-linux/tcl-60fbc1fa/libtcl8.6.so
No symbol table info available.
#6  0x00007fc4e58a57f5 in Tk_MainEx (argc=-1, argv=0x7fff66a148f8, appInitProc=0x400e5c <Tcl_AppInit>, interp=0x1bd1990)
    at /home/erik/priv/Develop/Tickets/crossingEventsUponPwDestroy/SOURCES/tk-60fbc1fa/unix/../generic/tkMain.c:328
        path = 0x1bfc180
        argvPtr = 0x1bfc1b0
        appName = 0x1bfc180
        encodingName = 0x0
        code = 0
        nullStdin = 0
        chan = 0x0
        is = {input = 0x0, tty = 1, command = {string = 0x2020200020200000 <error: Cannot access memory at address 0x2020200020200000>, 
            length = 536870912, spaceAvl = 538976288, 
            staticSpace = '\000' <repeats 88 times>, "(\000\000\000\000\000\000\000\005\000\000\000\000\000\000\000\060\035T\345\304\177\000\000\356\247_\345\304\177\000\000\225sY\345\304\177\000\000\356\247_\345\304\177\000\000\330\314I\345\304\177\000\000(\000\000\000\000\000\000\000\v\035T\345\304\177\000\000\000\000\000\000\000\000\000\000\006\000\000\000\000\000\000\000\002", '\000' <repeats 22 times>}, 
          line = {string = 0x0, length = 0, spaceAvl = 0, staticSpace = '\000' <repeats 199 times>}, gotPartial = 0, interp = 0x1bd1990}
#7  0x0000000000400e55 in main (argc=8, argv=0x7fff66a148b8)
    at /home/erik/priv/Develop/Tickets/crossingEventsUponPwDestroy/SOURCES/tk-60fbc1fa/unix/../unix/tkAppInit.c:93


My suspicion focuses on these memory addresses at #0:

    (mainPtr=0x207373656c6e7520, dispPtr=0x646e697720656d6f)

erikleunissen added on 2024-03-15 08:29:54:
OK. Now, I believe that it's possible. Not caused by the C code that we touch in this ticket, but through the tests or by some very weird flaw otherwise. I'm going to try to reproduce.

fvogel added on 2024-03-15 01:57:07:

"I can't believe that the change introduced with the immediately successive commit [185c9e86] can make the difference between segfaulting and not segfaulting."

Me too. I didn't say exactly this but that it cannot be reproduced with later commits in the same branch. Granted this was not 100% accurate, so let's be more precise: The segfault can be reproduced with [60fbc1fa4a] but not with [cb6c8177]. Just try and see. Or look at the Github actions results.

Can you reproduce? Seeing is believing :-).


erikleunissen added on 2024-03-14 21:59:07:
Regarding:

"The segfault can be reproduced with [60fbc1fa4a] but not with later commits in the same branch."

This is very peculiar! I can't believe that the change introduced with the immediately successive commit [185c9e86] can make the difference between segfaulting and not segfaulting.

erikleunissen added on 2024-03-14 21:28:03:
Oh, I can't say anything for macOS, but the C code that this ticket changes is not run on Linux.

fvogel added on 2024-03-14 21:05:38:

The segfault can be reproduced with [60fbc1fa4a] but not with later commits in the same branch. Can you reproduce?

The link with the present ticket must be that something is newly exercised on Linux by the added test cases, in other words the new test cases event-9.* reveal a latent bug on Linux.

Regarding what you just wrote: "we're touching code that is only run on MS Windows": if this is really true it strikes me as a very likely cause for the new event-9.* tests failures on macOS aqua.


erikleunissen added on 2024-03-14 20:46:02:
Oh, I'm going way too fast:

The segfault occurs on Linux, but we're touching code that is only run on MS Windows. So, I don't see the relation between the segfault and this ticket.

erikleunissen added on 2024-03-14 20:39:11:
Regarding the segfault on Linux, that you reported on 2024-03-10 16:12:17 and the two preceding posts:

- did you proceed with it?
- do you want me to proceed with it?

erikleunissen added on 2024-03-14 20:34:14:
Regarding the assignment of the variable "crossing":

- I understand your stance very well.
- I won't pursue the matter, but thanks for your suggestion.

fvogel added on 2024-03-14 20:09:28:

Regarding your B. below: noted.

Regarding restrictWinPtr: Well, I didn't write that code :-) The ThreadSpecificData struct has comments reading:

    TkWindow *grabWinPtr;	/* Window that defines the top of the grab
[...]				 * tree in a global grab. */
    TkWindow *restrictWinPtr;	/* Window to which all mouse events will be
				 * reported. */
This is the only "documentation" I could find about restrictWinPtr. This (TkWindow *) pointer is used in tkPointer.c in many places, and I have unfortunately no time to analyze what exactly is done with it.

What is sure is that your proposed "crossed.patch" changes the behavior of GenerateEnterLeave() in the case ((winPtr != lastWinPtr) && (restrictWinPtr)). Whatever this case is doing, I won't apply your patch unless we can exhibit a test case behaving wrongly in this case and that your patch solves. If you wish to go on investigating this, I strongly suggest filing a dedicated ticket for this because otherwise I fear we get distracted from the primary purpose of the present ticket. Thanks!


erikleunissen added on 2024-03-14 19:20:09:
In my post on 2024-03-07 22:38:09 I wrote:

B. Maybe the line:
  
   [https://core.tcl-lang.org/tk/file?ci=3eb956ad6f1fbc5a&name=generic/tkPointer.c&ln=535&ln=535]

needs to be reverted to:

  tsdPtr->lastWinPtr = TkGetContainer(winPtr);

I've finished experimenting: how hard I try with various cases where the parent
of the pointer window isn't the containing window, I can't get any difference.
So, there is no reason to change that code line; the current variant consumes
the least cpu cycles.

erikleunissen added on 2024-03-14 13:54:59:
Well, first off:

I understand what a grab and a grab window are. But I am unsure about the
terminology "restrict window" (or "restrictWinPtr") in relation to grabs,
because in the latter context I'd expect the term "grab window" (or
"grabWinPtr").
I'd like to think that they are similar things, but I don't know the exact
difference. I believe that I've seen the term "restrictWinPtr" mostly in
the context of button widgets, but I may be mistaken.
I'd be very grateful if you explained the difference to me.

To answer your question:

When inspecting the code for GenerateEnterLeave(), I looked at this condition:

   if (winPtr != lastWinPtr) {
   
which for me implied "crossing a window boundary".

I did not consider the restrict window. That's simply a matter of oversight.
But my unfamiliarity with the term "restrict window" as explained above may
have confused me also.

If a restrict window is similar to or the same as a grab window, then I
understand that that makes a difference for the assignment to the variable
"crossed".

So, I'd be grateful if you corrected my proposal for this case, and make
it correspond to *your* understanding of the matter.

fvogel added on 2024-03-14 06:30:59:
About 2.: I think I understand your point. From reading the comment in tkPointer.c:268-272 one really is interested in when the mouse pointer has crossed window boundaries indeed. However it looks like your proposal "crossed.patch" would also change the behavior during grabs. If I read the code correctly, currently GenerateEnterLeave() returns crossed=0 in this case, with your patch it returns crossed=1. Is this what you really intend to do, given the two calls to GenerateEnterLeave() in Tk_UpdatePointer()? Why ?

fvogel added on 2024-03-13 21:28:53:

Regarding this "crossing.patch":

1. Well found. NotifyNormal was wrongly passed in a detail field while it is a value for the mode field. Replacing by NotifyAncestor is a correct detail field value (and this does not change the behavior because both NotifyNormal and NotifyAncestor values happen to be zero). Commit is [1dac28ab]. Note that there were two occurrences of this in the codebase.

2. I have to think about this before commenting.


erikleunissen added on 2024-03-13 17:57:47:
The attached patch "crossing.patch" corrects two further bugs in
GenerateEnterLeave()in tkPointer.c that are not related to the issue
of this ticket:

1. assignment of a symbolic constant "NotifyNormal", which is invalid for
   a variable denoting the detail field of a crossing event, in line 211.
   ("NotifyNormal" is a symbolic constant for the mode field of a crossing
   event.)
2. assignment of value 1 to the variable "crossed" throughout function
   GenerateEnterLeave(), which I currently find inconsistent with both
   the following explanantion in the code comments:
   - having crossed a window boundary
   - having generated crossing events
   This change makes the code consistent with "having crossed a window
   boundary". However, I'm not quite confident here, and would appreciate
   specific review.

This patch addresses "minor issue A." in my post on 2024-03-07 22:38:09.

fvogel added on 2024-03-10 18:51:08:

I tried to find what's wrong on macOS but I don't have a clue. Running the statements from the tests manually in a shell on a mac does not provide the expected result (therefore the problem cannont be solved by delays, updates and such things as we are used to on the mac).

In the analysis process I have however reopened [47d4f29159] since the patch provided in that ticket does no longer seem to be complete to me.


erikleunissen added on 2024-03-10 16:24:23:
Oompff. Surprise!

I'd like to debug or help debugging the segfault. But, can only start doing so around next wednesday or so.

On macOS, I can be of no help whatsoever since I don't have the hardware (or toolchain).

fvogel added on 2024-03-10 16:12:17:

Running under valgrind spits:

---- listbox-31.1 start
==71831== Invalid read of size 8
==71831==    at 0x489F675: FindDisplayFocusInfo (tkFocus.c:1018)
==71831==    by 0x489F054: TkSetFocusWin (tkFocus.c:654)
==71831==    by 0x489E7FD: Tk_FocusObjCmd (tkFocus.c:199)
==71831==    by 0x4A5A8F6: Dispatch (tclBasic.c:4503)
==71831==    by 0x4A5A983: TclNRRunCallbacks (tclBasic.c:4539)
==71831==    by 0x4A5A0F4: Tcl_EvalObjv (tclBasic.c:4262)
==71831==    by 0x4A5C8DD: TclEvalEx (tclBasic.c:5408)
==71831==    by 0x4B76ED9: Tcl_FSEvalFileEx (tclIOUtil.c:1824)
==71831==    by 0x48B37FD: Tk_MainEx (tkMain.c:328)
==71831==    by 0x10B282: main (tkAppInit.c:93)
==71831==  Address 0x3a207465732020b8 is not stack'd, malloc'd or (recently) free'd
==71831== 
==71831== 
==71831== Process terminating with default action of signal 11 (SIGSEGV)
==71831==  General Protection Fault
==71831==    at 0x489F675: FindDisplayFocusInfo (tkFocus.c:1018)
==71831==    by 0x489F054: TkSetFocusWin (tkFocus.c:654)
==71831==    by 0x489E7FD: Tk_FocusObjCmd (tkFocus.c:199)
==71831==    by 0x4A5A8F6: Dispatch (tclBasic.c:4503)
==71831==    by 0x4A5A983: TclNRRunCallbacks (tclBasic.c:4539)
==71831==    by 0x4A5A0F4: Tcl_EvalObjv (tclBasic.c:4262)
==71831==    by 0x4A5C8DD: TclEvalEx (tclBasic.c:5408)
==71831==    by 0x4B76ED9: Tcl_FSEvalFileEx (tclIOUtil.c:1824)
==71831==    by 0x48B37FD: Tk_MainEx (tkMain.c:328)
==71831==    by 0x10B282: main (tkAppInit.c:93)


fvogel added on 2024-03-10 15:49:56:

The segfault can be reproduced by running:

xvfb-run --auto-servernum make test TESTFLAGS="-file \"event.test font.test listbox.test\" -verbose beps"


fvogel added on 2024-03-10 10:33:59:
How interesting. The CI runners at Github reveal that:

a. The test suite segfaults on Linux during listbox-31.1 now, but only when xvfb is used! (I can reproduce this)

b. The test suite on macOS aqua has several failures in event-9.*: almost all new tests fail. The tests need to be tuned for this platform.

fvogel added on 2024-03-09 18:00:19:
I'm with you regarding having a separate test tools file but never did it. Yes we should. Regarding moving procs such as proc waitForWindowEvent: my loose code of conduct so far is to not move such procs if they are not used elsewhere AND have no chance of being used elsewhere in the future. Pretty subjective, ok.

I have removed event-9.1 and event-9.2, and also the useless code in proc waitForWindowEven.

File event.test now passes on Windows and Linux. From past experience with these kind of tests I'm sure many new tests will fail on the mac (at least with aqua). We'll see what the CI runners spit.

erikleunissen added on 2024-03-09 17:07:21:
Another matter regarding proc waitForEnterEvent:

The following line is useless and can be removed:

      after cancel $afterID

erikleunissen added on 2024-03-09 17:06:35:
Regarding the questions:

> 1. Should proc waitForWindowEvent be placed in constraints.tcl, and
>    should other tests in event.test (or perhaps wider than this) use it?

A more generic comment:

I believe that it would be good for Tk to have a separate, common file dedicated
to test utilities used by all tests in the Tk test suite. If there were a file
like that, say "test_utils.tcl", my preference would be to move
"waitForWindowEvent" there (and also controlPointerWarpTiming for example).

Currently, the file "constraints.tcl" is being used as such, but the file name
suggests another purpose than test utilities.

I find it difficult to foresee whether it's likely that waitForWindowEvent is
going to be used by other test files than event.test. I feel that I don't have
enough overview over the Tk test suite to judge that. If it's an important
criterium for deciding about the place of waitForWindowEvent, then I'd trust
your inclination.


> 2. Couldn't we delete event-9.1 and event-9.2 from the test suite
>    since event-9.13 does the job (and better), according to the comment
>    that comes before the latter?

Yes, this is what I suggested in the initial text of the present ticket. I just
reviewed event-9.1 and event-9.2 to see whether there is something tiny that
isn't covered by event-9.13, but I don't see it. So as far as I'm concerned
tests event-9.[12] can be removed.

fvogel added on 2024-03-09 10:51:29:

Current state [d569f3a7] of the bugfix branch now tests OK on Windows and Linux (at least).

I'm wondering about:

  1. Should proc waitForWindowEvent be placed in constraints.tcl, and should other tests in event.test (or perhaps wider than this) use it?
  1. Couldn't we delete event-9.1 and event-9.2 from the test suite since event-9.13 does the job (and better), according to the comment that comes before the latter?


erikleunissen added on 2024-03-09 10:20:39:
There was no special reason to use "." for the new tests for the present ticket. (I just was being too lazy to create another one.)

Any toplevel should be OK to perform the function that "." had in these tests. Since "." is somewhat different from other toplevels, I agree that it is wise/better to use an arbitrary other toplevel (such as .one) instead of the root window for the purpose of these tests.

fvogel added on 2024-03-09 09:55:02:

The test suite hangs during focusTcl-2.1.

I have made further commits to resolve this, the most important one being [b16759f7]. I think I didn't distort the nature/purpose of any test by no longer using the root (".") toplevel anymore, but please confirm.


fvogel added on 2024-03-08 21:45:07:

No, [tkWithNotifyInferior] worked OK from the beginning. And yes, it's obvious it's not needed anymore since [47d4f29159] is fixed.

The current state of branch bug-22349fc78a now passes on Windows. This is just a quick work that can be refined. Let's however see what Github Actions thinks of the current state.


erikleunissen added on 2024-03-08 21:36:47:
B.t.w. Note that I understand that there is no need for [tkWithNotifyInferior] when the tests are included in the development branch for the present ticket.

erikleunissen added on 2024-03-08 21:08:58:
It's remarkable that:
- the tests that fail all expect a result that includes "Notifyinferior"
- tests that don't expect it, don't fail.

This makes me wonder whether [tkWithNotifyInferior] does its work correctly on your system.

fvogel added on 2024-03-08 20:32:26:

On Windows, when the tests are integrated in the test suite it hangs on tkwait visibility . in proc init_pos. I'll dive into this.

On Windows, when run as is as from the [3eb956ad] state outside of the test suite, the test script does not hang but produces several errors. I'll try to fix this also.

%  source  {C:/Users/francois/Desktop/tests leunissen.tcl}


==== event-9.11 pw container = parent FAILED
==== Contents of test case:

        destroy .f1.f2
        update; # service crossing events
        set result

---- Result was:
|
---- Result should have been (exact matching):
|<Enter> NotifyInferior .f1|
==== event-9.11 FAILED

++++ event-9.12 PASSED
++++ event-9.13 PASSED
++++ event-9.14 PASSED
++++ event-9.15 PASSED


==== event-9.16 Successive destructions (pw + parent), single generation of crossing events FAILED
==== Contents of test case:

        destroy .f1
        update; # service crossing events
        set result

---- Result was:
|
---- Result should have been (exact matching):
|<Enter> NotifyInferior .|
==== event-9.16 FAILED



==== event-9.17 Successive destructions (pw + parent), separate crossing events FAILED
==== Contents of test case:

        destroy .f1.f2
        update; # service crossing events
        destroy .f1
        update; # service crossing events
        set result

---- Result was:
|
---- Result should have been (exact matching):
|<Enter> NotifyInferior .f1|<Enter> NotifyInferior .|
==== event-9.17 FAILED

++++ event-9.18 PASSED
++++ event-9.19 PASSED
++++ event-9.20 PASSED
tests leunissen.tcl:    Total   10      Passed  7       Skipped 0       Failed  3
%


erikleunissen added on 2024-03-08 19:35:51:
I'd like to help debug the hanging test by inspecting the result of your integration into the Tk test suite.

Being away from my development systems, I can't run any code. But at a conceptual level I'm wondering: does the hang also occur when you run the test script unaltered as a separate Tcl script, that is, not integrated into the Tk test suite?

fvogel added on 2024-03-08 04:55:27:
I have already tried to integrate your new tests as is (or almost) in the test suite. My first attempt showed a hang in the first new test. I believe it hangs in waitForWindowEvent but didn't analyze. I didn't try on platforms other than windows.

erikleunissen added on 2024-03-07 22:51:54:
I'm sorry,

The link under minor issue B. has the wrong line number. The corrected link is:

   [https://core.tcl-lang.org/tk/file?udc=1&ln=on&ci=3eb956ad6f1fbc5a&name=generic%2FtkPointer.c&ln=535]

erikleunissen added on 2024-03-07 22:38:09:
OK. Thanks François.

I think it's best that I await your suggestions regarding the integration of the tests after you've had opportunity to review them.

In the meantime I want to inform you that a few minor issues have come up (see below). I'm not at my development machine, and cannot work on these issues for several days. So if you planned to already do some work preparing the integration of the tests, don't let those minor issues postpone your effort. As far as I'm concerned, all of them can be handled as adjustments/refinements after an initial integration into the bug fix branch of the tests in their current state.

Erik.

-- additional minor issues --

A. I discovered a possible other bug in GenerateEnterLeave(). It's unrelated to the issue of the present ticket, but while we're making changes, maybe this bug can be piggybacked with the present ticket.


B. Maybe the line:
  
   [https://core.tcl-lang.org/tk/file?udc=1&ln=on&ci=3eb956ad6f1fbc5a&name=generic%2FtkPointer.c&ln=155]

needs to be reverted to:

  tsdPtr->lastWinPtr = TkGetContainer(winPtr);

The difference is inconsequential for any of the test cases in crossing_events_upon_pw_destroy.test, which is why I thought that it didn't matter. However, I think I'm on to a scenario where it does. If so, that would lead to an extra test case.


C. test event-9.20 in the test script crossing_events_upon_pw_destroy.test needs the same trick as event-9.15 to prevent carrying over over events to the next test (if any).

fvogel added on 2024-03-07 20:13:31:

[47d4f29159] is now closed, with your fix merged in core-8-6-branch and trunk.

I have brought branch bug-22349fc78a up-to-date. Now let's go for step 3. as I described below.


erikleunissen added on 2024-03-04 22:13:07:
Sorry, I meant:

or such tests have to be skipped in branch core-8.6

erikleunissen added on 2024-03-04 22:11:06:
or:

- such tests have to be skipped

erikleunissen added on 2024-03-04 22:01:52:
I have second thoughts about the procedure that you proposed:

If the fix for [47d4f29159] is going to be merged into the bug fix branch for this ticket, it means that the bug fix branch cannot be merged with Tk branch core-8.6. Am I right?

That would be strange, because the misbehaviour that the patch corrects has nothing to do with ticket [47d4f29159]. It's just that some of the observed misbehaviour involves an event with detail field NotifyInferior. This, in turn, leads to some tests involving an event with detail field NotifyInferior.

Any issue involving observed behaviour regarding events with detail field NotifyInferior cannot be tested with Tk versions in the 8.6 branch. That is because the entire area of crossing events and focus events has been compromised by the issue of ticket [47d4f29159] since Tk's beginning. It is unfortunate that this becomes clear only now, and it stresses the lack of attention for this area prior to ticket [47d4f29159] and dependent tickets. However, I don't see why that would be a good reason for blocking patches for issues that are not caused by the issue in ticket [47d4f29159], but whose behaviour does involve a detail field NotifyInferior.

What it means according to me is that either:

  - the fix for [47d4f29159] needs to be merged into core-8.6 branch after all

or:

  - tests regarding such issues need different expected results in trunk and core-8.6 branches

erikleunissen added on 2024-03-01 22:47:23:
Thanks for reviewing this. 

I follow/understand your suggestion for proceeding with this ticket.

fvogel added on 2024-02-29 21:58:25:

Thank you Erik for this once more thorough analysis. I think this looks totally correct.

I have committed your patch as [04732899]. Github Actions reveals that the test suite passes with this on all three platforms.

At this point I think we should proceed in three steps: 1. merge the fix for [47d4f29159] first, then 2. merge it into the bugfix branch for the present ticket, and finally 3. make use of your test file to enhance the test suite with your new event-* tests.


Attachments: