Tk Source Code

View Ticket
Login
Ticket UUID: 22349fc78a7354435595369924179e8fabafb484
Title: Incorrect crossing events upon destruction of the pointer window under MS Windows, remaining issues on MacOS
Type: Bug Version: 8.6.13
Submitter: erikleunissen Created on: 2024-02-18 13:40:55
Subsystem: 69. Events Assigned To: marc_culler
Priority: 5 Medium Severity: Important
Status: Closed Last Modified: 2024-10-12 20:47:27
Resolution: Fixed Closed By: erikleunissen
    Closed on: 2024-10-12 20:47:27
Description:
Despite the efforts in tickets [9e1312f32c] and [b1d115fa60], bugs regarding
crossing events continue to show up under MS Windows when the pointer window
is being destroyed. Please see the tests described in section B. in order
to reproduce the bugs.

These additional bugs were the incentive to revisit the older tickets, and dig
further in order to understand what's really going on. This revealed that the
fix, developed in tickets [9e1312f32c] and [b1d115fa60], corrects the specific
misbehaviour as reported those tickets, but that it doesn't address the true
cause of the issue(s).

This post explains the true cause of the issue, provides tests and a fix
that addresses the true cause of the issue.


A. Explanation of the issue
===========================
The windowing system provided by MS Windows doesn't generate crossing events,
as opposed to X11. To compensate for that, Tk employs a self-built mechanism
that generates crossing events where needed (in tkPointer.c and
win/tkWinPointer.c). However, when generating crossing events, that mechanism
doesn't correctly consider the case where the pointer window is destroyed.
It handles that case as a regular mouse pointer transition between window A
and window B, where it ends up using something inappropriate for window A
instead of the pointer window (in the process of being destroyed). This
shortcoming leads to missing crossing events for intermediate windows in
the mouse pointer transition, and to crossing events with an incorrect
detail field. For the relevant code section, please see:

    [https://core.tcl-lang.org/tk/file?udc=1&ln=on&ci=a44363e604dadc30&name=generic%2FtkPointer.c&ln=168,189]

Tickets [9e1312f32c] and [b1d115fa60] have addressed particular expressions
of this shortcoming. The misbehaviours reported in those tickets were
difficult to see through because the shortcoming explained above interacts
with another, separate issue reported in [47d4f29159]. Their interplay
resulted in the observed misbehaviour. The details of that interplay are
further explained in a separate follow-up post to ticket [9e1312f32c].


B. Tests
========
The attached test script "crossing_events_upon_pw_destroy.test" holds ten
test cases that exercise the generation of crossing events, with specific
focus on the detail field of the crossing event.
Five of them (event-9.12, event-9.13, event-9.14, event-9.18 and event-9.19)
produce incorrect results under MS Windows, while producing correct results
under Linux:

- these tests PASS under Linux/X11 (10 different window managers, including
  no window manager at all, i.e. just the raw X server).
- these tests FAIL under MS Windows, when run without the patch in the current
  ticket (using Tk8.6.13 or Tk8.7a5).
- these tests PASS under MS Windows if the attached patch is applied.

The other tests in the test script are used to ensure correct functioning
of the attached patch but don't exercise current bugs.

Notes
-----
- the ten tests intend to cover all relevant transitions in the window
  hierarchy.
- tests event-9.11 and event-9.13 show overlap with existing tests
  (event-9, event-9.1 and event-9.2), except that they additionally test
  for a correct detail field. Whether the new tests can be appended or should
  replace existing tests, needs consideration.
  Note aside: fixing the issue in ticket [47d4f29159] invalidates test
  event-9.
  Therefore, the patch in that ticket removes it.
- tests event-9.11, event-9.16 and event-9.17 are affected by the issue
  described in [47d4f29159]. Therefore, the expected result of these tests has
  been made to depend on whether Tk invokes binding scripts for crossing
  events with detail NotifyInferior or not (i.e. whether the patch in
  [47d4f29159] was applied).
- tests don't wait for fixed amounts of time (e.g. for the windowing system
  to complete communications with Tk). Instead, they wait for specific events
  before proceeding (e.g. the servicing of windowing system notifications).
  This evades the problem of "how much waiting time is enough on an unknown
  end user's platform?", and it speeds up test execution considerably. On
  the other hand, it also requires more attention to details (see for example
  the explanation to test event-9.15). It's also a style that differs from
  the current style of waiting in the test script event.test.

For all these reasons, the tests in the test script have not been included
in the patch below. Tests will need to be handled separately, depending on
review and possibly consultation.


C. Patch
========
The attached patch "crossing_events_upon_pw_destroy.patch" fixes the issue by
addressing its true cause. The patch is based on Tk release 8.6.13, so it
replaces the shortcut fix developed with ticket [9e1312f32c]. Note aside:
the patch doesn't affect the change made by ticket [b1d115fa60] because
that change doesn't interfere with the true cause of the current issue,
and is an improvement in itself.


D. Conclusion
=============
- This post describes the cause of the reported issue. It's also the true
  cause for the misbehaviour reported with tickets [9e1312f32c] and
  [b1d115fa60].
- The attached patch holds a fix that addresses the cause of the issue.

Erik Leunissen.
User Comments: erikleunissen added on 2024-10-12 20:47:27:
Ah, I understand.
Thanks.

marc_culler (claiming to be Marc Culler) added on 2024-10-12 20:15:31:
Tk 9.0 for macOS works very differently at the core level.  It no longer uses
the [NSView drawRect] method.  That means that drawing operations no longer
happen twice (because the first time failed, because the drawing was attempted
outside of the drawRect function.). In Tk 9.0, drawing is now done synchronously.

This matters for this ticket.  I don't think it would have been possible to
make the tests pass reliably for Tk 8.6.  At least, not by me.

In fact, there are still sporadic failures of those tests, for reasons which I
do not understand.  (event-9.11 and event-9.16 are the troublemakers.)

erikleunissen added on 2024-10-12 19:53:27:
Let me invert that question in order to be more to the point:

Why did branch core-8.6 not receive the fix ?

erikleunissen added on 2024-10-12 18:36:32:
Re-opening this ticket with the following question:

Why was the fix not merged into branch core-8.6 ?

marc_culler (claiming to be Marc Culler) added on 2024-07-15 22:44:49:
I agree that this can be closed now.  I will do it.

fvogel added on 2024-07-08 20:45:54:

Fix for this was merged by Marc in [1dfe4fe9].

I guess we can now close this ticket, along with branches bug-22349fc78a-v2 (wich was merged) and bug-22349fc78a (which is the now abandoned original attempt at fixing this) ?


erikleunissen added on 2024-06-08 12:22:04:
In case that it was not clear enough from my message of 2024-06-05 08:04:05 :

I cannot contribute further to this ticket for the foreseeable future.

I'm sorry.

Tk maintainers need to proceed without my involvement.

marc_culler (claiming to be Marc Culler) added on 2024-06-05 15:24:14:
Hi Erik,

The question that I would most like to know the answer to is whether this
branch works with Windetect and Wintrackgui.  If not, then I would say it
does not meet our needs.  Can you report on that?

erikleunissen added on 2024-06-05 08:04:05:
Hi Marc,

I've decided to leave this ticket as it is,
and let the maintainers decide how to go forward.

Best regards,
Erik Leunissen.
--

marc_culler (claiming to be Marc Culler) added on 2024-06-04 21:42:19:
Hi Erik,

Feel free to rearrange the code in any way that seems to you to be an
improvement.  As long as the tests pass I will be happy.  Of course
I would prefer to have the test code make more sense and be more
readable.  I am confident that the tests in their current state are
verifying that the correct events are being generated. But I agree
that the current code is not optimal. I don't think it is perfect.
I think it is good enough.

Please keep in mind, though, that the platforms on which the tests must pass
are the Windows, Ubuntu and macOS Github runner images.  Even though no
human beings use those platforms, they are the only ones that matter these
days.  I put in many hours of painful, frustrating time getting the tests
to work on the Github runners.  I am not interested in starting that
process over again.  But I would be very happy to let someone else do that.

erikleunissen added on 2024-06-04 20:29:22:
* Regarding:

> 1) The result would contain records of <Enter> or <Leave> events which had
> been generarted when unrelated windows (usually . or .t) were created or
> destroyed.  These extraneous records were in addition to the records which
> the test was expecting.

I understand that such leftovers from a previous test, or from an initial code
section of a test file, should not interfere with a subsequent test.
Of course!

What I don't understand is that you choose to deal with it in the body
section of a subsequent test (say event-9.x).

It ought be part of the cleanup of the previous test (or initial code section)
that is responsible for inducing these  events. If proper cleanup is not
carried out in a timely fashion, such left-overs can not only leak through
to the immediately successive test, but accumulate for as long as the event
loop doesn't get a chance to perform work, and the events may carry over
to tests at any stage further in the test file.

This specific test hygiene is especially important for a test file whose
explicit purpose is tot test the functionality of event handling, but are
probably of lesser importance for other Tk test areas.


* Regarding:

> 2) The expected events were being generated (that was verified by adding
> print statements at the time the event was generated) but they were not
> being processed until the test body had finished running.

Of course, I understand that the event loop needs opportunity to service the
events that are generated in the body section of the same test. Is that what
you mean? [A, see B below]

Providing such opportunity (without attempting to select certain events),
is what the calls to "waitForWindowEvent", in the test bodies just before
calling "set result", are for. If there are no other events queued, then
"processevents enter/leave" probably achieves the same effect as
"waitForWindowEvent".

For example in test event-9.19, I see this body section:

    bind all <Leave> {append result "<Leave> %d %W|"}
    bind all <Enter> {append result "<Enter> %d %W|"}
    destroy .three
    waitForWindowEvent .two.f1.f2 <Enter>
    update idletasks; #finish destroying .two
    if {[tk windowingsystem] ne "x11"} {processevents enter leave}
    set result

The call to [waitForWindowEvent .two.f1.f2 <Enter>] should suffice here.

Of the two subsequent command lines (with my extra annotation a and b):

    update idletasks; #finish destroying .two                              (a)
    if {[tk windowingsystem] ne "x11"} {processevents enter leave}         (b)

(a) the call to "update idletasks" will never do what the comment says because
    the command "destroy .two" sits in the subsequent cleanup section.
    Why is it really there?
(b) I'm unsure about what "processevents enter leave" achieves here, because of
    the preceding "waitForWindowEvent .two.f1.f2 <Enter>", which also captures
    any corresponding <Leave> events because these always precede the <Enter>
    events in the event queue.

[B, see A above] However, if you mean that [waitForWindowEvent .two.f1.f2 <Enter>]
doesn't suffice to set the result, then something is seriously wrong further under the
hood. But in that case I'm beaten, and I give up.


* Regarding:

> A "solution" which works on some platforms and fails on others is not a solution.

Sure, I have no problem accepting a solution that's different from mine. I still
am concerned about "processevents" as far as its purpose is to manipulate the order
by which window events are serviced from the event loop. If that's necessary, then
I suspect something wrong with Tk's C code. If its effect boils down to what
"waitForEvent" does, then I'm only concerned about what I wrote in response to your
point 1) above.

I'll leave it to you and Francois (or other TCT members / maintainers) to
evaluate the concerns I raised, and decide how to go forward.

marc_culler (claiming to be Marc Culler) added on 2024-06-04 14:37:54:
Without the processevents command tests were failing for two reasons,
in spite of the fact that the correct events were being generated.

1) The result would contain records of <Enter> or <Leave> events which had
been generarted when unrelated windows (usually . or .t) were created or
destroyed.  These extraneous records were in addition to the records which
the test was expecting.

2) The expected events were being generated (that was verified by adding
print statements at the time the event was generated) but they were not
being processed until the test body had finished running.

The purpose of the processevents command is to filter out this noise to
allow the test to verify that the expected events are being generated
when they should be generated.

There are many other special commands, some of which are platform 
specific, implemented in the files tkTest.c, tkWinTest.c and tkMacOSXTest.c.
This is an accepted practice for Tk.

I would note one other important thing.  Tk supoorts all three platforms:
Windows, linux and macOS.  A "solution" which works on some platforms and
fails on others is not a solution.

erikleunissen added on 2024-06-04 10:36:00:
I've been reading the code in branch bug-22349fc78a-v2 more closely. This leads
me to express two rather fundamental concerns regarding the tests event-9.*
in that branch, specifically those tests that use the proc "processevents".
The reasoning of my concerns is based on what I can oversee for win32
(not macOS/aqua), noting that Tk emulates the behaviour of x11 on other
platforms to resolve differences w.r.t. the generation of events.

A. Problem with the C code in branch bug-22349fc78a-v2

   1. Before the introduction of "processevents", the concerned tests in
      branch bug-22349fc78a-v2 needed that cleanup commands, which themselves
      are unrelated to the test result, be placed in the body section of a test
      (instead of the cleanup section where they belong), in order to obtain
      the expected test result.

   2. If cleanup commands, which themselves are unrelated to the test result,
      *need* to be placed in the body section of a test in order to obtain the
      expected test result, then that is an indication that something is wrong
      with Tk's C code in branch bug-22349fc78a-v2 (probably the code that
      generates/processes crossing events, or maybe other window events).

   3. The proc "processevents" actively manipulates the order by which selected
      window events are serviced from the event queue, and thereby allows one
      to revert to the regular situation where cleanup commands can be put in
      the cleanup section (where they belong). Admittedly, it's a clever trick.
      However, it doesn't change the underlying problem with the C code. The
      fact that something like "processevents" is needed to be able to revert
      to the natural/regular order and place of commands, witnesses the very
      same underlying problem as in 2. It just does so in another way, hiding
      the symptoms described in 1. and 2. above.

   The above reasoning leads me to the conclusion that branch bug-22349fc78a-v2
   still has a problem that should not be there. (Branch bug-22349fc78a doesn't
   have it on win32.)


B. Demonstrating power of the tests event-9.* in branch bug-22349fc78a-v2

   Do the tests, that either use "processevents" or put cleanup commands in the
   body section, still demonstrate the original issue? Using "processevents"
   or using "displaced commands" both confound the demonstration of the original
   issue. How strict are you about this Marc and Francois?

I'd appreciate thorough evaluation of my concerns/reasoning (by both Marc and
Francois and possibly others who chime in).

The problem/concern that I've pointed out at A. is of foremost importance to
me. That's why I'm posting this message already before having done any further
review that involves compiling and running code.

Best regards,
Erik Leunissen.

fvogel added on 2024-05-31 17:12:42:
> would it be possible to bring both branches up to date w.r.t. the same checkin of branch core-8.6

Sure, now done.

erikleunissen added on 2024-05-31 08:24:14:
B.t.w.

I see that branches bug-22349fc78a and bug-22349fc78a-v2 both have branch core-8.6 merged in, but at different commits.

For the sake of easy comparison of the two branches, would it be possible to bring both branches up to date w.r.t. the same checkin of branch core-8.6 so that a single diff of the tips of these branches isn't cluttered by unrelated changes?

erikleunissen added on 2024-05-31 08:02:15:
I intend to post my comment/review[*] by the end of next week. I can't do that earlier because I need opportunity to build/run on my development system, which I'm nowhere near to until coming monday.

    [*] regarding x11 and win32 only, not macOS/aqua

marc_culler (claiming to be Marc Culler) added on 2024-05-31 01:34:22:
I think that the bug-22349fc78a-v2 is now working on all platforms and
is ready for a final review.  I would welcome comments from anyone, but
espcially Erik and François.

I implemented a processevents command in tkTest.c as proposed below, and I
reworked the tests to use it.  The event-9* tests now look fairly natural.
With considerable effort I think I have finally dealt with all of the
collateral damage caused by making this improvement to Tk.  There are still
sporadic failures of some tests which have no obvious connection to <Enter>
or <Leave> events on the CI runners, but that is nothing new.  (The CI
runners seems more prone to such things than computers used by people.)

erikleunissen added on 2024-05-29 14:16:49:
Hi Marc,

Thanks for sharing your view about certain awkwardnees in Tk regarding the processing and handling of window events.

My position/view of the matters that you raise (the "indeterminacy" and its handling) are more nuanced and maybe different from yours. I agree that this ticket is not the right place/channel to start a long discussion about them. However, it  may be important that you know know of them, and therefore I communicated my experiences/view/position to you by private email.

marc_culler (claiming to be Marc Culler) added on 2024-05-26 13:04:18:
Hi Erik,

This is off topic.  And it is a big topic, so this is not the place for a long
discussion of it.  But I wanted to point out that indeterminacy, caused by
race conditions, is a core feature of Tk, whether intentional or not.   To
say that a language is deterministic means that the running the same
instructions always produces the same result.  Here is a simple example
of how that is not true in Tk.  Consider the following two Tk commands:

toplevel .t
tkwait visibility .t

If you run that as a script, a toplevel appears.  If you type those commands
in the interpreter, it hangs.  Those are different outcomes. In fact, there
is nothing in the language specification which guarantees that the hang won't
occur in a script as well.  The creation of the toplevel could, in principle,
finish before the tkwait command is executed, causing the script to hang.
Of course this is very unlikely.  However, when running the tests there are
many analogous situations where the race between when one command finishes
and a subsequent command is executed does not favor one horse over the other
by such a large margin.  And which one is favored can easily depend on the
platform on which the program is running.

erikleunissen added on 2024-05-25 20:09:19:
OK Marc,

Your idea/plan a sounds like a lot of work involving details/intricacies. I don't understand the problem well enough to be of help here.

Regarding the techniques to handle timing issues with crossing events that I used for the package windetect:

As I said previously, I'm away from my development system. But I ought be able to reach my backups in the cloud from where I am now. Alas, that appears not to work and I can't remedy it.

Therefore, presenting you these techniques to see whether they can be of use, also needs to wait for another week. (B.t.w. these are script-only techniques.)

marc_culler (claiming to be Marc Culler) added on 2024-05-25 14:05:32:
Hi Erik,

No, of course I am not happy about having to make random perturbations in the
test code in order to get the tests to pass.

But I am happy that it was possible to do that.  I think that means that the
code itself is very likely working correctly, and the remaining issues are
with the testing apparatus, which is clearly not very robust as evidenced
by the long tradition of adding other random changes to test code in order
to make tests pass.  (Usually adding delays or adding update, or adding
update idletasks, or changing update to update idletasks, or removing one
or the other, or moving them around.)

I do have a plan, which I am starting to work on.  I am going to try
adding a new Tcl command to tkTest.c which processes all events on the
queue of a specified type.  (The first version will just process all
<Enter> and <Leave> events.)  Then I will try reorganizing the event-9*
tests so that the body of the test looks like:

testprocessevents {<Enter> <Leave>}
bind .x.y <Enter> {do something to modify result}
bind .x.y <Leave> {do something to modify result}
do something here
testprocessevents {<Enter> <Leave>}
set result

erikleunissen added on 2024-05-25 11:54:02:
Thanks for your explanation Marc,

I intend to respond in more detail in a subsequent post. For now this short and
still crude, possibly preliminary response, so that you know the gist of what's
coming.

I believe that I recognize some of the awkward aspects[*] regarding window
events that you describe. I believe to recognize them from my testing of an
upcoming new release of the package windetect[**] of which I am the author.
This package is based on and deals explicitly with crossing events.

    [*] - the leaking of window events from one test into the next
        - the non-sequential / non-deterministic aspects, of which I'm not yet
          quite convinced (based on my experience with X11 and win32, aqua may
          be very different)
    [**] https://sourceforge.net/projects/tkwintrack/

I'd really like to know:
-  whether you are happy/satisfied with the solution to move cleanup statements
   into the body section;
- (if NO), whether you can think of an alternative approach that achieves the same
  effect.

In the meantime, I'm going to study your response more closely, and see whether I can
come up with some suggestions for an (alternative) approach to some of the awkward
aspects that you mentioned, based on what I did for the package windetect. But I need
some time to formulate that properly.

--

marc_culler (claiming to be Marc Culler) added on 2024-05-24 12:49:20:
Hi Erik,

You have put your finger on what made this so difficult.  My process went like
this:

First, I wrote code which should have sent the correct enter and leave events
on macOS.  (There were lots of omissions in the code, where TkUpdatePointer
was called if the pointer moved but not if the window moved or was destroyed
or a new window created.)  But when I finished that and ran the tests, they
mostly failed. (As an aside, I found that TkUpdatePointer needed to be
called from TkDestroyWindow, not XDestroyWindow, in order for the TkGetContainer
macro to work correctly -- by the time XDestroyWindow is called from
TkDestroyWindow the information used by TkGetContainer has been erased.
So the changes I made also affected when TkUpdatePointer is called on
Windows.)

Next, I added print statements, most usefully in TkInOutEvents, to print to
stderr a description of each enter or leave event which was being *queued*.
I also added print statements to the tests, sending a message to stderr at
key points during each test.  I then ran the tests while redirecting stderr
to a file.  That gave me a record of which events were being queued at which
points during each of the 9.* tests.  (I did this on both platforms.)

The record that was generates showed the correct events being queued at
the correct time.  The record also showed a huge amount of noise in the
form of enter and leave events begin generated by the test framework as
various windows, not necessarily used by a given test are created and
destroyed outside of the test itself.  When I compared the records with
the test results as reported by tktest, the results of the failing tests
sometimes showed that none of the expected events had been recorded,
in spite of the fact that the expected events had all been queued at the
expected time.  In other cases all of the expected events had been recorded
but so had other unrelated events having to do with the test setup or
with the cleanup from the previous event.

People would like to believe that what happens when running the tests
is a repeatable, deterministic, sequential process.  In fact, we know that
is not true.  Otherwise why would the tests be full of update calls and delays
for arbitrary amounts of time?  Even after accepting that some indeterministic
behavior exists, one would like believe that it only happens on other
platforms from the one they are working on. But it is a real thing on all
platforms.  In any case, I found that ensuring that the events get processed
at the correct moment was extremely difficult and really had to be done by
trial and error, even though I could document that the events were being
queued at the correct moment.

So when you say that it seems completely arbitrary to, for example, move a
block of code from the cleanup section to the body section, you are absolutely
correct.  But that change was made because it makes the events which are
being queued at the correct moment also get processed at the correct moment.
There is a lot of invisible activity generated by the testing framework
which occurs during the transition from one test section to another and
(apparently) that activity interferes with how events get processed during
a test, as compared with what you would expect.

I don't think this indeterminacy is something that exists on one platform and
not on others.  I think they all have it, but the details of how it behaves
differ.  Personally, I blame tkBind.c, not for any specific reason, but
because it is a black box containing a lot of code which no current Tk
developer understands.

That said, there is a fundamental difference between how Windows and macOS
use TkUpdatePointer, even though that function was apparently intended to
service all non-X11 platforms in the same way.  That fundamental difference
is that Windows calls TkUpdatePointer every 250 milliseconds on a timer,
while macOS calls it with an event-driven scheme -- it is called if the
pointer moves or (after my changes) if a window moves or is created or
destroyed.

Incidentally, I think the 250 millisecond interval of the Windows timer
is probably not unrelated to the 200 millisecond pause time which is
used throughout the tests.  I think that many of those pauses could
now be removed, but I did not experiment with that.

erikleunissen added on 2024-05-24 08:27:42:
Hi Marc,

Here is my partial review of the code in the branch bug-22349fc78a-v2. Partial,
because I'm (conceptually) unfamiliar with macOS, aqua and objective C; let
alone that (on the practical side) I don't have hardware available, nor a
toolchain for macOS. So, I can't comment on the C code.

Furthermore, I'm away from my development system for at least a week. So, in the
short term I can't do any testing or running code otherwise.

Therefore, I've been having a look only at the file event.test, that is, the
differences between branch bug-22349fc78a and branch bug-22349fc78a-v2.
Here are my findings:

A. changes to the handling of toplevel window .one in tests event-9.13, event-9.19 and event-9.20
   These changes remove unnecessary code or simplify code otherwise. Well found!

B. changes to tests event-9.13, event-9.14, event-9.18 and event-9.19
       (excluding handling of window .one in tests event-9.13 and event-9.19)
   These changes move statements from the cleanup section to the test body.
   While I was composing the tests, I took care to put only statements into the
   test body that affect the test result (given the purpose of the test). The
   rest goes into the setup or the cleanup section.
   The changes suggest that the statements, moved from the cleanup section into
   the test body, need to be exercised in order to obtain the test result.
   That's not the case on Linux/X11 and not on win32, and I cannot imagine that
   it *should* be the case on macOS/aqua. (Rather, if a move like this is needed
   to make the tests pass on macOS/aqua, I'm inclined to ask what's wrong with
   the Tk code for macOS/aqua.)
   So, in short, I don't understand this "move" (pun is accidental ;-) ).
   Would you be willing to explain why that is necessary for macOS/aqua?

C. change to test event-9.15
   The change inverts the order of these two lines:

         destroy .two
         _pause 200; # ensure servicing of all scheduled events (only <Expose> events expected)

   The purpose of this test is to ensure that destruction of window .two does
   not induce crossing events. To that end, there must be opportunity for
   servicing any crossing events, (incorrectly) induced by [destroy .two], which
   is what [_pause 200] provides. However, inverting the two commands removes
   that opportunity. By consequence, nothing that [destroy .two] induces can
   have consequences for the test variable "result". So, the test cannot fail
   (or it can only fail for reasons related to code executed before [destroy .two],
   which is not the purpose of the test). In short, the inversion is in disagreement
   with the purpose of the test.

Thanks,
Erik Leunissen.

marc_culler (claiming to be Marc Culler) added on 2024-05-24 00:49:21:
The branch bug-22349fc78a-v2 now has a complete solution to this problem
that works on all platforms (as far as I know from testing on my hardware -
the CI run is pending.)

I would welcome comments and testing especially from Erik and François, who
have had enough experience in these waters to know where the rocks are.

This turned out to be quite non-trivial and there are lots of details,
some of which I have discussed with François.  I would be happy to discuss\
them with anyone else who is interested.  Hopefully I have also explained
most of it in the code.

oehhar added on 2024-05-06 08:02:36:

As the title states "MS Windows", it might be a good idea to change it to make clear, that there are remaining issues for MacOS.


fvogel added on 2024-05-05 22:03:52:
Assigning to Marc Culler regarding the remaining issues specific to macOS (several private emails exchanged).

erikleunissen added on 2024-04-14 17:12:19:
Thanks for having checked this once more.

I'm very relieved!

It resolves my fundamental cognitive dissonance about the hang. (As a collateral effect, it speeds up this instance of waiting considerably).

fvogel added on 2024-04-14 15:21:14:
In [6bdef79b62] I have restored the original code. Well, almost. The root issue with using "." in the test suite is that this window is already visible, thus [tkwait visibility .] would wait for it forever in the test suite. Now that we're using another window we can call [tkwait visibility $w] (with $w ne ".") instead of calling _pause. This change is not paramount to me, but why not.

fvogel added on 2024-04-14 15:14:15:

I cannot reproduce the hang anymore either at [116624c2].

At [b1256d26] (with the _pause 200 --> tkwait visibility change on top of it) however it hangs, indeed because [tkwait visibility .] waits forever since the . window is already visible.


erikleunissen added on 2024-04-11 10:13:14:
Not that it changes anything about my point of view, but the proposed multi-platform alternative can be simpler:

    if {! [winfo ismapped $w]} {
        tkwait visibilty $w
        update
    }

erikleunissen added on 2024-04-11 10:03:43:
OK. Thanks for explaining these remaining matters.

With respect to a hang in:
   
        tkwait visibility $w
        update; # service remaining screen drawing events (e.g. <Expose>)

I cannot reproduce the hang, not with my original test script (of course),
and not with the current test script as integrated in the Tk test suite,
where I replaced the line:

    _pause 200; # service remaining screen drawing events (e.g. <Expose>)

This is all under MS Windows 7. I would be very much surprised if this varies
with the version of MS Windows.

For my personal interest, I would like to sort this out, but I can't
because I have no other version of MS Windows to execute the test file with.

In the interest of Tk: I leave it up to you whether you find worth while
to look into this further and reproduce the hang once more. I also don't
want to disturb the current state of affairs where the tests have already
been tuned to pass with Github, and I'm OK with leaving the matter as it is.
It's just that I find it dissatisfying not to understand why I can't
reproduce the hang.

A hang probably means that the <Visibility> event has already been serviced,
and that the window $w is already mapped before the call to "tkwait visibility".
Using that, I can think of a multi-platform alternative:

	# service any remaining screen drawing events
	if {[winfo ismapped $w]} {
		_pause 200; 
	} else {
        tkwait visibility $w
        update
	}

However, I find this alternative very ugly/cumbersome. I first would like to
know whether the hang depends on the version of MS Windows, before pursuing
this multi-platform alternative. For this reason also, I'm OK to just let
things as they are, unless you are motivated to pursue this.

fvogel added on 2024-04-10 19:25:27:

Your original code (tkwait visibility . ; update) lets tests such as event-9.11 hang on Windows (in the test suite I mean, perhaps not in your test script I don't remember). From this observation I just aligned your test code with how other tests in event.test are doing and have replaced your visibility statements by _pause 200 as elsewhere in the test file.

This indeed does not guarantee that the window is visible, although 200 ms is enough to let it work in practice (no unstable tests results seen so far). If you can propose something better (and multi-platform), just say so and I'll happily commit your proposal.

The choice of the value 200 ms was (short answer) a copy/paste. (Longer answer) It was governed by which value in other tests was large enough to produce the required results and small enough to not slow the tests down more than needed. No consideration regarding MOUSE_TIMER_INTERVAL was made. I have just tried on Windows with _pause 500 everywhere instead of _pause 200 and the tests do pass identically (as I would have expected, it MUST be so otherwise the test does not demonstrate anything. The value 200 ms was not chosen to prevent further calls to Tk_PointerEvent()).


erikleunissen added on 2024-04-10 14:41:11:
Since the tests for this ticket were integrated into the Tk test suite,
I didn't have a close look to them. (That's merely because my drive to do so
is inversely proportional to my confidence that you did this accurately,
François.) Well, I did that now, and everything looks fine to me, except for the
following two aspects where I have a small doubt/question:

A. Regarding this line:

        [https://core.tcl-lang.org/tk/file?ci=e3db006d1160463e&name=tests/event.test&ln=913]
    
    In my original test script the code reads:
    
        tkwait visibility .
        update; # service remaining screen drawing events (e.g. <Expose>)

    Now it is:

        _pause 200; # service remaining screen drawing events (e.g. <Expose>)

    a. Was the removal of the call to "tkwait visibility" on purpose? It makes that
       we can't be sure that the window is visible, e.g. if the end user's system
       is very busy processing other stuff during the 200 ms that the Tk process
       is waiting). This makes me uncomfortable.
    b. If the removal of the call to "tkwait visibility" was on purpose, then
       the comment is somewhat misleading, especially the word "remaining".
       What's happening during the [_pause 200] is much more than just the
       servicing of "remaining" drawing events. A better comment would be:
   
          _pause 200; # let the drawing of the window to the screen occur to its completion

       (or something equivalent).

B. Regarding the 200 ms that _pause is waiting:
   It didn't go unnoticed that 200 ms is less than the current value for MOUSE_TIMER_INTERVAL
   (250 ms). The choice makes that there is no time for the sequence MouseTimerProc() -> Tk_PointerEvent()
   to be run twice. (This is on MS Windows, I don't know if this also holds for macOS/aqua).
   
   Was this consideration indeed the reason for choosing 200 ms? If so, wouldn't it wise to
   add this as an explanation somewhere?

fvogel added on 2024-04-07 10:11:51:

I'm dropping my findings regarding the macOS aqua case here so that they do not get lost.

The test case I consider is event-9.11.

  1. When event-9.11 destroys .one.f1.f2, XdestroyWindow() is called for .one.f1.f2, which calls TkPointerDeadWindow() for it. Windows and macOS aqua behave identically here.
  1. In TkPointerDeadWindow(), the condition if (winPtr == tsdPtr->lastWinPtr) is true on Windows but not on macOS aqua. This means that tsdPtr->lastWinPtr is not correct on macOS aqua. Indeed, tsdPtr->lastWinPtr is supposed to refer to the window that was most recently containing the pointer, which should be .one.f1.f2. So macOS aqua does not maintain tsdPtr->lastWinPtr correctly.
  1. lastWinPtr is only set in GenerateEnterLeave(), which is only called from Tk_UpdatePointer(). So this confirms that there must be a missing call to Tk_UpdatePointer().
  1. Instrumentation of event-9.11 tells that the two platforms behave differently during the _pause 200 in the setup phase. Windows calls Tk_UpdatePointer() and then GenerateEnterLeave() on .one.f1.f2 while macOS aqua does not.
  1. Instrumenting tkProcessMouseEvent() (#define TK_MAC_DEBUG_EVENTS 1) shows that on macOS aqua there are four events serviced during that _pause 200. First two NSMouseExited events, and then two NSMouseEntered events. All four cases call return theEvent ; tkProcessMouseEvent() does not call Tk_UpdatePointer() later in its code. Important enough is the fact the window getting these four events is .one while it should be .one.f1.f2 (as on Windows).
  1. Commit [b3c05ed6] is a proof of concept letting event-9.11 pass on macOS aqua. This shows there are missing NSevents (or missing processing of these events by the macOS aqua code).


fvogel added on 2024-03-23 17:00:42:

OK, committed. Thanks.

Yes, I also had thought about leaving the macOS situation as it is and constrain the tests. "notAqua" would be the right constraint. The tests pass on the mac with XQuartz (which is expected). I'm not yet fully ready to give up yet.


erikleunissen added on 2024-03-23 15:42:25:
The change in commit [60fbc1fa]] wasn't good enough after all. This is what's needed, based on [b0172da5] the current tip of the bug-fix branch (sorry for the piecewise tactics):

--- event.test.1        2024-03-23 16:30:51.000000000 +0100
+++ event.test.2        2024-03-23 16:34:10.693688779 +0100
@@ -880,13 +880,15 @@
     set _windowEvent($counter) 1
     set savedBinding [bind $w $event]
     bind $w $event [list +waitForWindowEvent.signal $counter]
-    after $timeout [list set _windowEvent($counter) -1]
+    set afterID [after $timeout [list set _windowEvent($counter) -1]]
     vwait _windowEvent($counter)
     set late [expr {$_windowEvent($counter) == -1}]
     bind $w $event $savedBinding
     unset _windowEvent($counter)
     if {$late} {
        puts "waiting for $event event on $w timed out (> $timeout ms)"
+    } else {
+       after cancel $afterID
     }
 }
 proc waitForWindowEvent.signal {counter} {

erikleunissen added on 2024-03-23 15:24:23:
If macOS/aqua is broken in this area, and the resources fail to fix it,
then what about constraining the tests that don't pass on macOS/aqua,
and proceed with the improvements so that MS Windows can profit from them?

The tcltest docs mention a constraint "macCrash". However, the word "crash"
is inappropriate. There is a "unixOrWin", which seems a bit skew to the
purpose, but might be sufficient. What's needed is a "failsOnMac" or "failsOnMacAqua" constraint, but that isn't provided by tcltest.

By the way: do the tests also fail with mac/XQuartz?

fvogel added on 2024-03-21 21:20:42:

After spending some time on macOS, I believe that the mechanism generating crossing events on Windows simply does not exist on the mac. IOW what Tk implements for Windows in tkPointer.c and win/tkWinPointer.c, expecially Tk_PointerEvent does not seem to exist for the mac. If so, fixing this does not seem to be within my reach.

Or I didn't find this mechanism.

Or it's perhaps not needed, but in this case how is the whole thing supposed to work...?


fvogel added on 2024-03-17 15:09:54:

Right, now done in [69c3dfd1].

About the segfault: follow-up to [fdc0ed342d] for a reproducible script, a short explanation of what's happening, and the fix.

What remains to do for the present ticket is to understand why almost all new tests fail on macOS. I'll have to spend some time on this...


erikleunissen added on 2024-03-17 12:42:08:
correction: toplevel .row. -> toplevel .one

erikleunissen added on 2024-03-17 12:38:58:
By enlarging the windows in [8e55f77c], the mouse pointer destination under KDE/Plasma is right inside the wm border for toplevel .row. in test event-9.15.

This is uncomfortably close to incorrect. I suggest to change (in test event-9.15):

event generate .two <Motion> -warp 1 -x 250 -y 250

    to:

event generate .two <Motion> -warp 1 -x 275 -y 275

erikleunissen added on 2024-03-15 11:00:01:
The segfault occurs if one or both of the following tests are executed before test event-2.1(keypress):

event-1.1
event-1.2

Executing other tests in event.test doesn't seem to matter.

I think it's best that we do not let this bug clutter the present ticket.

erikleunissen added on 2024-03-15 10:25:48:
I've just removed all tests for the present ticket (event-9.11 through event-9.20), including the specific utility procs for these tests from event.test (copy of checkin [60fbc1fa4a]).

Result: the segfault prevails immediately after test listbox-30.1

erikleunissen added on 2024-03-15 09:10:20:
OK. Now reproduced with [60fbc1fa4a] and your invocation:

    xvfb-run --auto-servernum make test TESTFLAGS="-file \"event.test font.test listbox.test\" -verbose beps"

obtaining the same back trace. Here's the result of "bt full":

(gdb) bt full
#0  0x00007fc4e5891944 in FindDisplayFocusInfo (mainPtr=0x207373656c6e7520, dispPtr=0x646e697720656d6f)
    at /home/erik/priv/Develop/Tickets/crossingEventsUponPwDestroy/SOURCES/tk-60fbc1fa/unix/../generic/tkFocus.c:1018
        displayFocusPtr = 0x1faedb0
#1  0x00007fc4e5891328 in TkSetFocusWin (winPtr=0x287d9d0, force=1)
    at /home/erik/priv/Develop/Tickets/crossingEventsUponPwDestroy/SOURCES/tk-60fbc1fa/unix/../generic/tkFocus.c:654
        displayFocusPtr2 = 0x1e2f610
        focusPtr = 0x22becf0
        tlFocusPtr = 0x238a240
        displayFocusPtr = 0x1faedb0
        topLevelPtr = 0x1c14c80
        allMapped = 1
        serial = 0
#2  0x00007fc4e5890ad3 in Tk_FocusObjCmd (clientData=0x1c14c80, interp=0x1bd1990, objc=3, objv=0x1bd6c10)
    at /home/erik/priv/Develop/Tickets/crossingEventsUponPwDestroy/SOURCES/tk-60fbc1fa/unix/../generic/tkFocus.c:199
        focusOptions = {0x7fc4e59b3890 "-displayof", 0x7fc4e59b389b "-force", 0x7fc4e59b38a2 "-lastfor", 0x0}
        tkwin = 0x1c14c80
        winPtr = 0x1c14c80
        newPtr = 0x287d9d0
        topLevelPtr = 0x0
        tlFocusPtr = 0x1bd1990
        windowName = 0x1e2f610 ".l"
        index = 1
#3  0x00007fc4e5492bc6 in TclNRRunCallbacks ()
   from /home/erik/priv/Develop/Tickets/crossingEventsUponPwDestroy/BUILD/x86_64-linux/tcl-60fbc1fa/libtcl8.6.so
No symbol table info available.
#4  0x00007fc4e549501a in TclEvalEx ()
   from /home/erik/priv/Develop/Tickets/crossingEventsUponPwDestroy/BUILD/x86_64-linux/tcl-60fbc1fa/libtcl8.6.so
No symbol table info available.
#5  0x00007fc4e55641b8 in Tcl_FSEvalFileEx ()
   from /home/erik/priv/Develop/Tickets/crossingEventsUponPwDestroy/BUILD/x86_64-linux/tcl-60fbc1fa/libtcl8.6.so
No symbol table info available.
#6  0x00007fc4e58a57f5 in Tk_MainEx (argc=-1, argv=0x7fff66a148f8, appInitProc=0x400e5c <Tcl_AppInit>, interp=0x1bd1990)
    at /home/erik/priv/Develop/Tickets/crossingEventsUponPwDestroy/SOURCES/tk-60fbc1fa/unix/../generic/tkMain.c:328
        path = 0x1bfc180
        argvPtr = 0x1bfc1b0
        appName = 0x1bfc180
        encodingName = 0x0
        code = 0
        nullStdin = 0
        chan = 0x0
        is = {input = 0x0, tty = 1, command = {string = 0x2020200020200000 <error: Cannot access memory at address 0x2020200020200000>, 
            length = 536870912, spaceAvl = 538976288, 
            staticSpace = '\000' <repeats 88 times>, "(\000\000\000\000\000\000\000\005\000\000\000\000\000\000\000\060\035T\345\304\177\000\000\356\247_\345\304\177\000\000\225sY\345\304\177\000\000\356\247_\345\304\177\000\000\330\314I\345\304\177\000\000(\000\000\000\000\000\000\000\v\035T\345\304\177\000\000\000\000\000\000\000\000\000\000\006\000\000\000\000\000\000\000\002", '\000' <repeats 22 times>}, 
          line = {string = 0x0, length = 0, spaceAvl = 0, staticSpace = '\000' <repeats 199 times>}, gotPartial = 0, interp = 0x1bd1990}
#7  0x0000000000400e55 in main (argc=8, argv=0x7fff66a148b8)
    at /home/erik/priv/Develop/Tickets/crossingEventsUponPwDestroy/SOURCES/tk-60fbc1fa/unix/../unix/tkAppInit.c:93


My suspicion focuses on these memory addresses at #0:

    (mainPtr=0x207373656c6e7520, dispPtr=0x646e697720656d6f)

erikleunissen added on 2024-03-15 08:29:54:
OK. Now, I believe that it's possible. Not caused by the C code that we touch in this ticket, but through the tests or by some very weird flaw otherwise. I'm going to try to reproduce.

fvogel added on 2024-03-15 01:57:07:

"I can't believe that the change introduced with the immediately successive commit [185c9e86] can make the difference between segfaulting and not segfaulting."

Me too. I didn't say exactly this but that it cannot be reproduced with later commits in the same branch. Granted this was not 100% accurate, so let's be more precise: The segfault can be reproduced with [60fbc1fa4a] but not with [cb6c8177]. Just try and see. Or look at the Github actions results.

Can you reproduce? Seeing is believing :-).


erikleunissen added on 2024-03-14 21:59:07:
Regarding:

"The segfault can be reproduced with [60fbc1fa4a] but not with later commits in the same branch."

This is very peculiar! I can't believe that the change introduced with the immediately successive commit [185c9e86] can make the difference between segfaulting and not segfaulting.

erikleunissen added on 2024-03-14 21:28:03:
Oh, I can't say anything for macOS, but the C code that this ticket changes is not run on Linux.

fvogel added on 2024-03-14 21:05:38:

The segfault can be reproduced with [60fbc1fa4a] but not with later commits in the same branch. Can you reproduce?

The link with the present ticket must be that something is newly exercised on Linux by the added test cases, in other words the new test cases event-9.* reveal a latent bug on Linux.

Regarding what you just wrote: "we're touching code that is only run on MS Windows": if this is really true it strikes me as a very likely cause for the new event-9.* tests failures on macOS aqua.


erikleunissen added on 2024-03-14 20:46:02:
Oh, I'm going way too fast:

The segfault occurs on Linux, but we're touching code that is only run on MS Windows. So, I don't see the relation between the segfault and this ticket.

erikleunissen added on 2024-03-14 20:39:11:
Regarding the segfault on Linux, that you reported on 2024-03-10 16:12:17 and the two preceding posts:

- did you proceed with it?
- do you want me to proceed with it?

erikleunissen added on 2024-03-14 20:34:14:
Regarding the assignment of the variable "crossing":

- I understand your stance very well.
- I won't pursue the matter, but thanks for your suggestion.

fvogel added on 2024-03-14 20:09:28:

Regarding your B. below: noted.

Regarding restrictWinPtr: Well, I didn't write that code :-) The ThreadSpecificData struct has comments reading:

    TkWindow *grabWinPtr;	/* Window that defines the top of the grab
[...]				 * tree in a global grab. */
    TkWindow *restrictWinPtr;	/* Window to which all mouse events will be
				 * reported. */
This is the only "documentation" I could find about restrictWinPtr. This (TkWindow *) pointer is used in tkPointer.c in many places, and I have unfortunately no time to analyze what exactly is done with it.

What is sure is that your proposed "crossed.patch" changes the behavior of GenerateEnterLeave() in the case ((winPtr != lastWinPtr) && (restrictWinPtr)). Whatever this case is doing, I won't apply your patch unless we can exhibit a test case behaving wrongly in this case and that your patch solves. If you wish to go on investigating this, I strongly suggest filing a dedicated ticket for this because otherwise I fear we get distracted from the primary purpose of the present ticket. Thanks!


erikleunissen added on 2024-03-14 19:20:09:
In my post on 2024-03-07 22:38:09 I wrote:

B. Maybe the line:
  
   [https://core.tcl-lang.org/tk/file?ci=3eb956ad6f1fbc5a&name=generic/tkPointer.c&ln=535&ln=535]

needs to be reverted to:

  tsdPtr->lastWinPtr = TkGetContainer(winPtr);

I've finished experimenting: how hard I try with various cases where the parent
of the pointer window isn't the containing window, I can't get any difference.
So, there is no reason to change that code line; the current variant consumes
the least cpu cycles.

erikleunissen added on 2024-03-14 13:54:59:
Well, first off:

I understand what a grab and a grab window are. But I am unsure about the
terminology "restrict window" (or "restrictWinPtr") in relation to grabs,
because in the latter context I'd expect the term "grab window" (or
"grabWinPtr").
I'd like to think that they are similar things, but I don't know the exact
difference. I believe that I've seen the term "restrictWinPtr" mostly in
the context of button widgets, but I may be mistaken.
I'd be very grateful if you explained the difference to me.

To answer your question:

When inspecting the code for GenerateEnterLeave(), I looked at this condition:

   if (winPtr != lastWinPtr) {
   
which for me implied "crossing a window boundary".

I did not consider the restrict window. That's simply a matter of oversight.
But my unfamiliarity with the term "restrict window" as explained above may
have confused me also.

If a restrict window is similar to or the same as a grab window, then I
understand that that makes a difference for the assignment to the variable
"crossed".

So, I'd be grateful if you corrected my proposal for this case, and make
it correspond to *your* understanding of the matter.

fvogel added on 2024-03-14 06:30:59:
About 2.: I think I understand your point. From reading the comment in tkPointer.c:268-272 one really is interested in when the mouse pointer has crossed window boundaries indeed. However it looks like your proposal "crossed.patch" would also change the behavior during grabs. If I read the code correctly, currently GenerateEnterLeave() returns crossed=0 in this case, with your patch it returns crossed=1. Is this what you really intend to do, given the two calls to GenerateEnterLeave() in Tk_UpdatePointer()? Why ?

fvogel added on 2024-03-13 21:28:53:

Regarding this "crossing.patch":

1. Well found. NotifyNormal was wrongly passed in a detail field while it is a value for the mode field. Replacing by NotifyAncestor is a correct detail field value (and this does not change the behavior because both NotifyNormal and NotifyAncestor values happen to be zero). Commit is [1dac28ab]. Note that there were two occurrences of this in the codebase.

2. I have to think about this before commenting.


erikleunissen added on 2024-03-13 17:57:47:
The attached patch "crossing.patch" corrects two further bugs in
GenerateEnterLeave()in tkPointer.c that are not related to the issue
of this ticket:

1. assignment of a symbolic constant "NotifyNormal", which is invalid for
   a variable denoting the detail field of a crossing event, in line 211.
   ("NotifyNormal" is a symbolic constant for the mode field of a crossing
   event.)
2. assignment of value 1 to the variable "crossed" throughout function
   GenerateEnterLeave(), which I currently find inconsistent with both
   the following explanantion in the code comments:
   - having crossed a window boundary
   - having generated crossing events
   This change makes the code consistent with "having crossed a window
   boundary". However, I'm not quite confident here, and would appreciate
   specific review.

This patch addresses "minor issue A." in my post on 2024-03-07 22:38:09.

fvogel added on 2024-03-10 18:51:08:

I tried to find what's wrong on macOS but I don't have a clue. Running the statements from the tests manually in a shell on a mac does not provide the expected result (therefore the problem cannont be solved by delays, updates and such things as we are used to on the mac).

In the analysis process I have however reopened [47d4f29159] since the patch provided in that ticket does no longer seem to be complete to me.


erikleunissen added on 2024-03-10 16:24:23:
Oompff. Surprise!

I'd like to debug or help debugging the segfault. But, can only start doing so around next wednesday or so.

On macOS, I can be of no help whatsoever since I don't have the hardware (or toolchain).

fvogel added on 2024-03-10 16:12:17:

Running under valgrind spits:

---- listbox-31.1 start
==71831== Invalid read of size 8
==71831==    at 0x489F675: FindDisplayFocusInfo (tkFocus.c:1018)
==71831==    by 0x489F054: TkSetFocusWin (tkFocus.c:654)
==71831==    by 0x489E7FD: Tk_FocusObjCmd (tkFocus.c:199)
==71831==    by 0x4A5A8F6: Dispatch (tclBasic.c:4503)
==71831==    by 0x4A5A983: TclNRRunCallbacks (tclBasic.c:4539)
==71831==    by 0x4A5A0F4: Tcl_EvalObjv (tclBasic.c:4262)
==71831==    by 0x4A5C8DD: TclEvalEx (tclBasic.c:5408)
==71831==    by 0x4B76ED9: Tcl_FSEvalFileEx (tclIOUtil.c:1824)
==71831==    by 0x48B37FD: Tk_MainEx (tkMain.c:328)
==71831==    by 0x10B282: main (tkAppInit.c:93)
==71831==  Address 0x3a207465732020b8 is not stack'd, malloc'd or (recently) free'd
==71831== 
==71831== 
==71831== Process terminating with default action of signal 11 (SIGSEGV)
==71831==  General Protection Fault
==71831==    at 0x489F675: FindDisplayFocusInfo (tkFocus.c:1018)
==71831==    by 0x489F054: TkSetFocusWin (tkFocus.c:654)
==71831==    by 0x489E7FD: Tk_FocusObjCmd (tkFocus.c:199)
==71831==    by 0x4A5A8F6: Dispatch (tclBasic.c:4503)
==71831==    by 0x4A5A983: TclNRRunCallbacks (tclBasic.c:4539)
==71831==    by 0x4A5A0F4: Tcl_EvalObjv (tclBasic.c:4262)
==71831==    by 0x4A5C8DD: TclEvalEx (tclBasic.c:5408)
==71831==    by 0x4B76ED9: Tcl_FSEvalFileEx (tclIOUtil.c:1824)
==71831==    by 0x48B37FD: Tk_MainEx (tkMain.c:328)
==71831==    by 0x10B282: main (tkAppInit.c:93)


fvogel added on 2024-03-10 15:49:56:

The segfault can be reproduced by running:

xvfb-run --auto-servernum make test TESTFLAGS="-file \"event.test font.test listbox.test\" -verbose beps"


fvogel added on 2024-03-10 10:33:59:
How interesting. The CI runners at Github reveal that:

a. The test suite segfaults on Linux during listbox-31.1 now, but only when xvfb is used! (I can reproduce this)

b. The test suite on macOS aqua has several failures in event-9.*: almost all new tests fail. The tests need to be tuned for this platform.

fvogel added on 2024-03-09 18:00:19:
I'm with you regarding having a separate test tools file but never did it. Yes we should. Regarding moving procs such as proc waitForWindowEvent: my loose code of conduct so far is to not move such procs if they are not used elsewhere AND have no chance of being used elsewhere in the future. Pretty subjective, ok.

I have removed event-9.1 and event-9.2, and also the useless code in proc waitForWindowEven.

File event.test now passes on Windows and Linux. From past experience with these kind of tests I'm sure many new tests will fail on the mac (at least with aqua). We'll see what the CI runners spit.

erikleunissen added on 2024-03-09 17:07:21:
Another matter regarding proc waitForEnterEvent:

The following line is useless and can be removed:

      after cancel $afterID

erikleunissen added on 2024-03-09 17:06:35:
Regarding the questions:

> 1. Should proc waitForWindowEvent be placed in constraints.tcl, and
>    should other tests in event.test (or perhaps wider than this) use it?

A more generic comment:

I believe that it would be good for Tk to have a separate, common file dedicated
to test utilities used by all tests in the Tk test suite. If there were a file
like that, say "test_utils.tcl", my preference would be to move
"waitForWindowEvent" there (and also controlPointerWarpTiming for example).

Currently, the file "constraints.tcl" is being used as such, but the file name
suggests another purpose than test utilities.

I find it difficult to foresee whether it's likely that waitForWindowEvent is
going to be used by other test files than event.test. I feel that I don't have
enough overview over the Tk test suite to judge that. If it's an important
criterium for deciding about the place of waitForWindowEvent, then I'd trust
your inclination.


> 2. Couldn't we delete event-9.1 and event-9.2 from the test suite
>    since event-9.13 does the job (and better), according to the comment
>    that comes before the latter?

Yes, this is what I suggested in the initial text of the present ticket. I just
reviewed event-9.1 and event-9.2 to see whether there is something tiny that
isn't covered by event-9.13, but I don't see it. So as far as I'm concerned
tests event-9.[12] can be removed.

fvogel added on 2024-03-09 10:51:29:

Current state [d569f3a7] of the bugfix branch now tests OK on Windows and Linux (at least).

I'm wondering about:

  1. Should proc waitForWindowEvent be placed in constraints.tcl, and should other tests in event.test (or perhaps wider than this) use it?
  1. Couldn't we delete event-9.1 and event-9.2 from the test suite since event-9.13 does the job (and better), according to the comment that comes before the latter?


erikleunissen added on 2024-03-09 10:20:39:
There was no special reason to use "." for the new tests for the present ticket. (I just was being too lazy to create another one.)

Any toplevel should be OK to perform the function that "." had in these tests. Since "." is somewhat different from other toplevels, I agree that it is wise/better to use an arbitrary other toplevel (such as .one) instead of the root window for the purpose of these tests.

fvogel added on 2024-03-09 09:55:02:

The test suite hangs during focusTcl-2.1.

I have made further commits to resolve this, the most important one being [b16759f7]. I think I didn't distort the nature/purpose of any test by no longer using the root (".") toplevel anymore, but please confirm.


fvogel added on 2024-03-08 21:45:07:

No, [tkWithNotifyInferior] worked OK from the beginning. And yes, it's obvious it's not needed anymore since [47d4f29159] is fixed.

The current state of branch bug-22349fc78a now passes on Windows. This is just a quick work that can be refined. Let's however see what Github Actions thinks of the current state.


erikleunissen added on 2024-03-08 21:36:47:
B.t.w. Note that I understand that there is no need for [tkWithNotifyInferior] when the tests are included in the development branch for the present ticket.

erikleunissen added on 2024-03-08 21:08:58:
It's remarkable that:
- the tests that fail all expect a result that includes "Notifyinferior"
- tests that don't expect it, don't fail.

This makes me wonder whether [tkWithNotifyInferior] does its work correctly on your system.

fvogel added on 2024-03-08 20:32:26:

On Windows, when the tests are integrated in the test suite it hangs on tkwait visibility . in proc init_pos. I'll dive into this.

On Windows, when run as is as from the [3eb956ad] state outside of the test suite, the test script does not hang but produces several errors. I'll try to fix this also.

%  source  {C:/Users/francois/Desktop/tests leunissen.tcl}


==== event-9.11 pw container = parent FAILED
==== Contents of test case:

        destroy .f1.f2
        update; # service crossing events
        set result

---- Result was:
|
---- Result should have been (exact matching):
|<Enter> NotifyInferior .f1|
==== event-9.11 FAILED

++++ event-9.12 PASSED
++++ event-9.13 PASSED
++++ event-9.14 PASSED
++++ event-9.15 PASSED


==== event-9.16 Successive destructions (pw + parent), single generation of crossing events FAILED
==== Contents of test case:

        destroy .f1
        update; # service crossing events
        set result

---- Result was:
|
---- Result should have been (exact matching):
|<Enter> NotifyInferior .|
==== event-9.16 FAILED



==== event-9.17 Successive destructions (pw + parent), separate crossing events FAILED
==== Contents of test case:

        destroy .f1.f2
        update; # service crossing events
        destroy .f1
        update; # service crossing events
        set result

---- Result was:
|
---- Result should have been (exact matching):
|<Enter> NotifyInferior .f1|<Enter> NotifyInferior .|
==== event-9.17 FAILED

++++ event-9.18 PASSED
++++ event-9.19 PASSED
++++ event-9.20 PASSED
tests leunissen.tcl:    Total   10      Passed  7       Skipped 0       Failed  3
%


erikleunissen added on 2024-03-08 19:35:51:
I'd like to help debug the hanging test by inspecting the result of your integration into the Tk test suite.

Being away from my development systems, I can't run any code. But at a conceptual level I'm wondering: does the hang also occur when you run the test script unaltered as a separate Tcl script, that is, not integrated into the Tk test suite?

fvogel added on 2024-03-08 04:55:27:
I have already tried to integrate your new tests as is (or almost) in the test suite. My first attempt showed a hang in the first new test. I believe it hangs in waitForWindowEvent but didn't analyze. I didn't try on platforms other than windows.

erikleunissen added on 2024-03-07 22:51:54:
I'm sorry,

The link under minor issue B. has the wrong line number. The corrected link is:

   [https://core.tcl-lang.org/tk/file?udc=1&ln=on&ci=3eb956ad6f1fbc5a&name=generic%2FtkPointer.c&ln=535]

erikleunissen added on 2024-03-07 22:38:09:
OK. Thanks François.

I think it's best that I await your suggestions regarding the integration of the tests after you've had opportunity to review them.

In the meantime I want to inform you that a few minor issues have come up (see below). I'm not at my development machine, and cannot work on these issues for several days. So if you planned to already do some work preparing the integration of the tests, don't let those minor issues postpone your effort. As far as I'm concerned, all of them can be handled as adjustments/refinements after an initial integration into the bug fix branch of the tests in their current state.

Erik.

-- additional minor issues --

A. I discovered a possible other bug in GenerateEnterLeave(). It's unrelated to the issue of the present ticket, but while we're making changes, maybe this bug can be piggybacked with the present ticket.


B. Maybe the line:
  
   [https://core.tcl-lang.org/tk/file?udc=1&ln=on&ci=3eb956ad6f1fbc5a&name=generic%2FtkPointer.c&ln=155]

needs to be reverted to:

  tsdPtr->lastWinPtr = TkGetContainer(winPtr);

The difference is inconsequential for any of the test cases in crossing_events_upon_pw_destroy.test, which is why I thought that it didn't matter. However, I think I'm on to a scenario where it does. If so, that would lead to an extra test case.


C. test event-9.20 in the test script crossing_events_upon_pw_destroy.test needs the same trick as event-9.15 to prevent carrying over over events to the next test (if any).

fvogel added on 2024-03-07 20:13:31:

[47d4f29159] is now closed, with your fix merged in core-8-6-branch and trunk.

I have brought branch bug-22349fc78a up-to-date. Now let's go for step 3. as I described below.


erikleunissen added on 2024-03-04 22:13:07:
Sorry, I meant:

or such tests have to be skipped in branch core-8.6

erikleunissen added on 2024-03-04 22:11:06:
or:

- such tests have to be skipped

erikleunissen added on 2024-03-04 22:01:52:
I have second thoughts about the procedure that you proposed:

If the fix for [47d4f29159] is going to be merged into the bug fix branch for this ticket, it means that the bug fix branch cannot be merged with Tk branch core-8.6. Am I right?

That would be strange, because the misbehaviour that the patch corrects has nothing to do with ticket [47d4f29159]. It's just that some of the observed misbehaviour involves an event with detail field NotifyInferior. This, in turn, leads to some tests involving an event with detail field NotifyInferior.

Any issue involving observed behaviour regarding events with detail field NotifyInferior cannot be tested with Tk versions in the 8.6 branch. That is because the entire area of crossing events and focus events has been compromised by the issue of ticket [47d4f29159] since Tk's beginning. It is unfortunate that this becomes clear only now, and it stresses the lack of attention for this area prior to ticket [47d4f29159] and dependent tickets. However, I don't see why that would be a good reason for blocking patches for issues that are not caused by the issue in ticket [47d4f29159], but whose behaviour does involve a detail field NotifyInferior.

What it means according to me is that either:

  - the fix for [47d4f29159] needs to be merged into core-8.6 branch after all

or:

  - tests regarding such issues need different expected results in trunk and core-8.6 branches

erikleunissen added on 2024-03-01 22:47:23:
Thanks for reviewing this. 

I follow/understand your suggestion for proceeding with this ticket.

fvogel added on 2024-02-29 21:58:25:

Thank you Erik for this once more thorough analysis. I think this looks totally correct.

I have committed your patch as [04732899]. Github Actions reveals that the test suite passes with this on all three platforms.

At this point I think we should proceed in three steps: 1. merge the fix for [47d4f29159] first, then 2. merge it into the bugfix branch for the present ticket, and finally 3. make use of your test file to enhance the test suite with your new event-* tests.


Attachments: