Tk Source Code

View Ticket
Login
Ticket UUID: 06d8246baf7d000971b73dc040a874dfa719fbe4
Title: Mac OS: scrollbar does not update properly when using mousewheel
Type: Bug Version: tip of 8.6 branch
Submitter: kevin_walzer Created on: 2020-02-24 03:22:18
Subsystem: 16. [scrollbar] Assigned To: marc_culler
Priority: 5 Medium Severity: Minor
Status: Closed Last Modified: 2020-08-18 15:01:40
Resolution: Fixed Closed By: marc_culler
    Closed on: 2020-08-18 15:01:40
Description:
The mousewheel bindings for the listbox widget do not seem to update the scrollbar smoothly anymore.  To test this, run the widget demo and choose the 50-state listbox, and then use the mousewheel to scroll the listbox. The listbox itself scrolls smoothly, but the scrollbar does not update except at major intervals (usually when the listbox is scrolled halfway down, and then completely to the bottom). I've observed this on macOS 10.15.3 with the tip of 8.6 branch. I'm not certain if it's an issue on other platforms. There has been a lot of work done on scrolling recently so it's possible this is a regression/new bug of some sort. Andy Colebourne first reported the bug to me, and I've been able to reproduce it at least on the Mac. Can anyone test this on X11 and Windows, as well?
User Comments: marc_culler (claiming to be Marc Culler) added on 2020-08-18 15:01:40:
This specific problem has been fixed in the current tip of core-8-6-branch
or the 8.6.11 release candidate.  So I am closing the ticket.

chrstphrchvz added on 2020-07-18 14:34:30:

A few more review/testing tickets: [95751c509a], [7ebdd17974], [278fdd11d9]


chrstphrchvz added on 2020-07-14 10:04:46:

Noting here that tickets [0e200f4575] and [94db8de0d8] have been opened regarding testing of idle_curiosity branch, and tcl/a58737f57b for accompanying Tcl macOS_hangs branch); and that issue [aedc9f6fa4]/patch [37a27ddb2d] appear to be related at least to the work done for this ticket's issue.


nab added on 2020-06-12 08:56:02:
Hi Marc,
does idle_curiosity branch leads to somewhere according to you?

++

nab added on 2020-06-08 15:59:46:
yup.. I have only one call to update in my code, which I've removed :)

marc_culler (claiming to be Marc Culler) added on 2020-06-08 15:55:35:
Thanks, Nicolas.

I am aware that the changes to the check proc broke the scheme for allowing
Tk to handle events while a menu is open.  You will notice that the
pendulum stops when a menu is open in the stylish_idle_curiosity branch.
I will probably have to redesign how that works.

By the way, the "official" advice about update is that if you find that you
need to call it then you are probably doing something wrong:

https://wiki.tcl-lang.org/page/update

Obviously calling it should not cause crazy behavior, though.

nab added on 2020-06-08 15:19:15:
using stylish_mac_styles branch (which deploy a thread for menu in order to let the event loop to perform while menu is posted).
I use to withdraw toplevel, build everything and deiconify.
if a call to create a toplevel is made using menu (either click on the menuBar or thanks to shortcut) and a pure call to update (not update idletasks) is made before the deiconify call, the toplevel never appears and cpu is going crazy (like in a do-while loop that can never go out)

about callgrind: I'll try

++

marc_culler (claiming to be Marc Culler) added on 2020-06-08 14:48:56:
@nab: If you can run a profiler like callgrind on your app I would be interested
in seeing the results.

nab added on 2020-06-08 14:39:08:
you're right, an update idletasks instead of the win_modifyDelay call do display all widgets of toplevels.

++

nab added on 2020-06-08 13:52:10:
for mac_style heavy load is 35-40%
for stylish_idle_curiosity heavy load is 65%

marc_culler (claiming to be Marc Culler) added on 2020-06-08 13:37:37:
I think they are being drawn *again*.  But the second time they are drawn the
foreground color has been set up correctly, because the switch to DarkAqua has
been completed, so they no longer have the same color as the background.

Do you mean that the CPU usage is 30% higher than the CPU usage for the
mac_styles branch or that the total CPU usage is reported as 30%?  What is the
total CPU usage in both cases?

nab added on 2020-06-08 12:58:28:
regarding the ttk:labels, they're drawn (they apppear) as soon as I move mouse over them.

regarding the 30%, it's compare to mac_styles branch. (same computer, same file loaded in my app, same operations) 

++

marc_culler (claiming to be Marc Culler) added on 2020-06-08 12:43:21:
Since you are changing the Appearance of your window to DarkAqua, it would make
sense that ttk::labels are affected differently from tk::labels.  The tk::labels
do exactly the same drawing whether they are in Dark Mode or not.  The
ttk::labels use "semantic colors" which have different RGB values in Light and
Dark Mode.  Most likely your labels were being drawn, but the letters were
being drawn in the same color as the background.

I wonder if the switch from LightAqua to DarkAqua is taking place inside
of drawRect.  That would surely cause something weird to happen.

I don't understand what you mean when you say that CPU usage is "+30%".  Are
you saying 30% more than something else?  What is the something else?  How am
I supposed to calibrate these numbers?  I think we are doing less drawing with
this branch, so I would expect that to reduce CPU usage.  However, I would
expect this approach to consistently do all drawing twice - once outside of
drawRect when the Tcl event loop is handling idle tasks and then once more
inside of drawRect when the special inner event loop is handling the expose
events that are generated in drawRect.  If the current 8.6 approach is
accidentally managing to be more efficient by doing some of its drawing
within drawRect the first time, due to the fact that drawing is done at
unpredictable times and sometimes not at all, then we may have to choose
between fast and unreliable or slow and reliable.  I am providing a way for
a display procedure to detect whether it is being called from drawRect,
so in principle it could do just a minimal amount of work when called the
first time and do the real drawing work when called from drawRect.  But that
would involve rewriting all of the display procedures.  It seems better to
have a framework in place before doing anything like that.  And it seems
like a lot of work with many opportunities for making mistakes.

nab added on 2020-06-08 08:48:45:
more experiments shows that following code which introduce 100ms delay after deiconify seems to give times to Tk to display all widgets of toplevels (including ttk::label ones)

proc nsWin_sleep {ms namespace} {
    namespace upvar $namespace nsWin_sleepV localnsWin_sleepV
    after $ms {set localnsWin_sleepV 1}
    vwait localnsWin_sleepV

}
proc win_modifyDelay {} {
    nsWin_sleep 100 myNamespace
}

proc win_deiconifySequence {win} {
    if {[tk windowingsystem] eq {aqua} && $::tcl_platform(osVersion) >= 18} {
        #following update idletasks must be in place before MacWindowStyle
        update idletasks
        ::tk::unsupported::MacWindowStyle appearance $win darkaqua
    }
    wm deiconify $win
    if {[tk windowingsystem] eq {aqua}} {
        win_modifyDelay
    }
}


++

nab added on 2020-06-08 08:39:25:
I'm seeing the light!! (I'm kidding :) )
this is very good.

toplevels are drawn fast, no issues with tablelist, this is also very good because it means that all idle stuff are honored (tablelist is the perfect example of idle drawings).

some stuff are still weird though:
if I grid/pack a label in toplevels, they're perfectly drawn. 
If I grid/pack a ttk::label, it's not drawn until I move mouse over the frame/ttk::frame that contains the ttk::label (no matter what is the -style applied to the ttk::label)

CPU is high (+30%) with my test file.

with the latest script (with <Return> binding to toggle red/yellow canvas) I saw only one time a canvas badly redrawn. It was not because of key repetition. It happens only one time over > 300 keyPress/Release <Return>, and once it happened it never happens again.

in my app I'm using tkdnd to provide drag'n drop capabilities.
I do see one issue there mainly because (I think) mouse B1 is pressed and in the same time B1-Motion is involved.
In my app, I do display some colours for canvas when the dn'd operation is occuring (mouse is over canvas items) and with this branch myCanvas itemconfigure myItem -fill thisColour does not do anything.

but the whole direction of this branch seems nice especially if now drawRect() is controlled.

++
Nicolas

marc_culler (claiming to be Marc Culler) added on 2020-06-07 22:37:19:
With the tip of the macOS_hangs branch of Tcl and the tip of either
idle_curiosity or stylish_idle_curiosity of Tk, I am no longer seeing
windows that are only partially rendered.  So I am thinking this is very
close to being finished.

Nicolas, what are you seeing with these versions?

marc_culler (claiming to be Marc Culler) added on 2020-06-06 22:46:43:
Yes, I am also seeing refresh issues where widgets do not get drawn until
there is a mouse event.  So this is not finished.  But I am out of ideas
at this point.  Maybe someone else will have one.

I do think that when the redraw fails, Tcl is blocking in Tcl_WaitForEvent.
I am not sure whether it has anything to do with Tk at all, actually.

nab added on 2020-06-06 22:07:17:
ok, I did compile the TCL branch + this Tk branch.
when widgets are not drawn, the background of the toplevel is visible. Before there where a black square where widgets should be drawn.
with the new tcl branch + this Tk branch widgets are still not redrawn. I tried update/update idletasks calls after toplevel deiconify, but no success.

marc_culler (claiming to be Marc Culler) added on 2020-06-06 21:28:25:
I have started a Tcl branch called macOS_hangs which attempts to address the
problem that macOS Tk sometimes hangs in Tcl_WaitForEvent until you move the
mouse over a window.

I would suggest building Tcl from that branch before testing the branches for
this ticket.

marc_culler (claiming to be Marc Culler) added on 2020-06-06 21:06:30:
On second thought, the incomplete window issue looks more problematic.  I think
some of the things I did to fix that may have made it worse.

marc_culler (claiming to be Marc Culler) added on 2020-06-06 20:32:20:
@nab: Two things -

1. That commit is intended to go with a change to Tcl which I haven't pushed
yet.

2. The problem with new windows not being completely drawn is something I have
seen too.  I don't know if it is really related to this, although it could be.
However, I have found that you can work around this by adding some calls to
update idletasks.  Sometimes you might have to test whether a part of the
window has been mapped and if not run wait visibility before running update
idletasks.

nab added on 2020-06-06 19:11:03:
Hi Marc,
with your latest commit I still do see toplevels not entirely redrawn.

++

nab added on 2020-06-06 07:53:54:
Hi Marc,

well done with the script, it now passes here too (with the exception of keyRepeat as you've noticed).

here's a link to a video of how my app react with this branch: https://vimeo.com/426478616
each time I keyPress on the Flash button, little square above numbers 1, 2, 3, 4, 5 should go yellow. Each time I keyRelease, they should go black. With this branch they don't...
At 10 sec of the video I open a toplevel that display buttons/labels on the top and a tablelist, but toplevel is partially drawn and button do appears only when I move mouse above them.
I initially thought it was because at toplevel creation I call this code:
update idletasks
::tk::unsupported::MacWindowStyle appearance $win darkaqua

but no... removing the update idletasks still make the toplevel to be drawn partially.

hope this can help.
++

marc_culler (claiming to be Marc Culler) added on 2020-06-05 18:41:16:
Thanks, Kevin.  I'll take a look.

Nicolas,

I think I am close now.  Your 500 canvas script almost works correctly now, and
I think the problem when it doesn't has to do with something else.

The script works fine with the mouse and it works fine with the return key
*except* if you hold the return key down until it starts repeating.  When a
key starts repeating on the mac, the system does not send press and release
events repeatedly like X11 does.  It sends press events with the repeat flag
set. So the mac port of Tk artificially creates a KeyRelease XEvent and sends
it right after the KeyPress XEvent.  I think something is a bit wrong with that
mechanism.  My guess at the moment is that both events have the same timestamp.
So when the return key starts repeating you are creating two timer events for
the same time.  One asks the canvas to turn red and the other asks it to turn
yellow and it is supposed to do both things at the exact same time.  So what
actually happens is that some canvasses turn red and some turn yellow in some
more or less random pattern.  This may be an issue, but not necessarily
related to the problem in this ticket.

kevin_walzer added on 2020-06-05 17:13:52:
I added that call years ago, in 2015 I think - it dates back to the days when Tk was still using all those private Cocoa API's and we had zero control of the event loop - stuff would hang randomly. It was essentially a hack that brought considerable improvement. Perhaps it's no longer needed. If you check the file on the Fossil repo and check "blame," you will see the commit in which I put it, and clicking on the commit link will bring up a diff of the changes. Perhaps that might be helpful to look at, Marc.

nab added on 2020-06-05 16:37:21:
I hope you'll find the reason why...
++

marc_culler (claiming to be Marc Culler) added on 2020-06-05 15:35:54:
Thanks, Nicolas.

The key question that I need to answer before I can go any further is: why does
Tcl_WaitForEvent sometimes hang?

It does not seem to be a new problem.  The implementation of Tcl_WaitForEvent in
macOSX/tclMacOSXNotifier.c has this comment:

/* This call seems to simply force event processing through and prevents
 hangups that have long been observed with Tk-Cocoa.  */

But now that we are not calling drawRect willy-nilly anymore the hangs have
become a bigger problem.

nab added on 2020-06-04 08:11:44:
HI Marc,
thank you for the merge in mac_style.
so, using stylish_idle_curiosity when tablelist is involved I can see a some glitches (I've asked Csaba if he can have a look).
also using following script it can still happen that some items of canvas are not redrawn (you have more chances to get it wrong with KeyPress/Release Return).
I also saw one thing but it does happen very rarely: 
-when an item is not redrawn I had to move the mouse over the canvas (without clic) for the item to be redrawn
in the script I use -container option of frame in order to display canvas in a toplevel (I did that because I use -container in my app) but not using this mechanism is the same, some items are not always redrawn.

also, compiling Tk in deploy or develop mode produce different glitches.
in develop mode, I often need to move mouse over widgets for them to be displayed/updated

in terms of CPU usage, it's high... but it's acceptable as long as all calls for widget updates are honored.

all the best,
nicolas


package require Tk
package require Ttk

ttk::button .b -text push
set fc [frame .f -container 1]
set master [toplevel .n -use [winfo id $fc]]


canvas $master.c

for {set x 1} {$x<500} {incr x 5} {
    $master.c create rectangle 0 $x 100 [expr {$x + 5}] -fill black -tags tag$x
}

grid .b -sticky news
grid $fc -sticky news
grid $master.c -sticky news
grid columnconfigure $master all -weight 1
grid rowconfigure $master all -weight 1
grid columnconfigure . all -weight 1
grid rowconfigure . 1 -weight 1


bind .b <ButtonPress-1> {doColorchange red}
bind .b <ButtonRelease-1> {doColorchange yellow}
bind . <KeyPress-Return> {doColorchange red}
bind . <KeyRelease-Return> {doColorchange yellow}

proc doColorchange {colour} {
    global master
    for {set x 1} {$x<500} {incr x} {
        after $x [list $master.c itemconfigure tag$x -fill $colour]
     }
}

marc_culler (claiming to be Marc Culler) added on 2020-06-03 17:48:47:
OK.  I merged mac_styles into idle_curiosity.  That involved resolving one small
conflict.  The new branch is called stylish_idle_curiosity.

nab added on 2020-06-03 16:57:29:
HI Marc,
would it be possible to create a branch offset of mac_styles branch with idle_curiosity branch?
I'm very very very used to use mac_styles branch and I know how my app react.

thank you,
Nicolas

marc_culler (claiming to be Marc Culler) added on 2020-06-03 13:46:33:
The idle_curiosity branch is now ready for testing.  The three scripts posted
inline here (and now attached) all work.  The scrollbars in the "states" demo
now work correctly.  All regression tests now pass consistently (as opposed to
having a few sporadic failures, varying between deployment and development
branches and between different runs).  Some tests had special-cased different
"correct" results for macOS; now macOS gets the same results as other platforms.

Here is a summary of the main changes.

1.  The actual low level cause of the drawing failures revealed by Nicolas'
script was that the drawing of some of the canvas widgets was taking place
outside of the clipping rectangle which AppKit adds to the graphics context
provided to drawRect.  When drawRect runs it always clips to the rectangle
passed as its rect parameter.  (Yes, this contradicts Apple's documentation.)
The function TkMacOSXSetupDrawingContext has been modified to check for
this and set canDraw to false when the clipping region of the TK drawing
context is not contained in the clipping region of the current CoreGraphics
context.

2. Since Apple (at least on Catalina) now always provides only a single
dirty rectangle to drawRect, code for dealing with regions given by a list
of rectangles has been simplified.

3. Christopher observed that drawRect is not run in a mysterious
asynchronous way, but rather (sometimes) gets run when the Tcl event loop
explicitly releases control, either by fetching NSEvents from the AppKit
queue or by starting a new runloop.  I realized that this means that we
can actually control when it runs.  I have given each NSView its own
tkDirtyRect rectangle and tkNeedsDisplay flag.  When a widget needs display
these are updated.  But the AppKit needsDisplay flag is never set outside
of one new function TkMacOSXDrawAllViews, which gives AppKit an opportunity
to call drawRect with the tkDirtyRect rectangle and then sets the needsDisplay
flag back to NO if AppKit declines to run drawRect.  (Appkit accumulates
the dirty rectangles according to some private algorithm.  In Nicolas' script,
for example, the entire window with 99 canvases becoming dirty at different
times, drawRect runs 4 or 5 times.)

4. Now that we can control when drawing happens, we can do it when Tk expects
it to be done - namely after all available events have been added to the Tk
queue and there is nothing else to do.  This happens in the
TkMacOSXEventsCheckProc.  When it returns without adding any events, the
Tcl_DoOneEvent function would normally proceed to run all idle tasks.  Just
before that happens we add one new idle task to the idle queue, which calls
TkMacOSXDrawAllViews.  That way the drawing happens after all of the pending
idle tasks that might reconfigure a widget.

5. After concentrating the drawing in one place, issues with Tcl_WaitForEvent
hanging were revealed.  These had been partly masked by the random places in
whichvdrawRect was being called.  But an attempt had been made to deal with
these by asking AppKit to issue periodic events.  This only partly worked and
for accidental reasons.  In fact, if Tcl_WaitForEvent has decided to wait
instead of poll for events the only thing (outside of asynchronous file events)
which will wake it up is a timer event.  So I replaced the NSPeriodicEvents
with no-op timer events which actually cause Tcl_WaitForEvent to return. (These
only run when there are no user events being generated.)

6.  I have added a hook TkpWillDrawWidget which can be used by a display proc
to find out whether it is being called from drawRect and also to inform the
NSView of which widget is being drawn at the moment.  This is only used by
the test package and in tkFont to avoid redefining all of the fonts while
drawRect is running, which causes crashes.

marc_culler (claiming to be Marc Culler) added on 2020-06-02 20:48:49:
Sure, please test the idle_curiosity branch.  I know of some glitches which I
may or may not be able to fix.  You will probably find those.  And I expect to
be making another commit fairly soon.  But mostly it seems to be working and
to be an improvement.  The changes are fairly complicated.  I will post an
explanation later.

nab added on 2020-06-02 13:01:11:
Hi Marc,
does the idle_curiosity branch is ready for testing?

best regards,
nicolas

marc_culler (claiming to be Marc Culler) added on 2020-05-31 02:57:57:
Further testing reveals that on Catalina the [NSView wantsDefaultClipping]
getter is never called, and [NSView drawRect:] always clips to the rectangle
that is passed to drawRect.

Also, while it appears to be the case that a call to [NSApp nextEventMatchingMask: ...]
may result in drawRect being called before the function returns, having NSViews
with non-empty dirty rectangles does not guarantee that it will be run.
Appkit will accumulate changes to the dirty rectangle for a while before calling
drawRect.  I don't know if it is waiting for a certain amount of time to pass
or for a large enough damage region or some combination of those, possibly with
other factors.

marc_culler (claiming to be Marc Culler) added on 2020-05-28 15:02:08:
I tested the idea of having a separate queue from the idle queue to hold
widget display tasks and having drawRect call of the display procs on the
display queue.  This required a global change in the generic code that looks
like:

Tcl_DoWhenIdle(DisplayX, XPtr) ->
   Tk_QueueDisplayTask(XPtr->tkwin, DisplayX, Xptr)

Tcl_CancelIdleCall(DisplayX, XPtr) ->
   Tk_CancelDisplayTask(XPtr->tkwin, DisplayX, XPtr)

Passing the tkwin allows Tk to figure out which NSView is involved in the
display operation.  Each NSView has its own queue of pending display tasks.

Basically, what happened is that many, many things broke.  This even includes
segfaults in the Tcl byte code handler.  My conclusions from tracking down
some of the failures are:

1. There are many, many parts of the Tk code that rely on side effects of
running a display proc.  (Note, however, that this ticket is about a case
where we do *not* want the side effect.  We do not want the scrollbar to be
marked as having been displayed when in fact it was not displayed because its
display proc was called outside of drawRect, and then it was also not drawn
when the listview was drawn because it lies outside of the listview bounding
rectangle, which was the dirty rect when drawRect was drawing the listview.)

2. There are also many parts of the Tk code that make assumptions about when
an idle task will be run compared to when an event driven task will be run.

3. It is common for display tasks to be called before a Tk Window has been
constructed, i.e. before its underlying NSView exists.  And the side effects
of running those display procs are, evidently, crucial.

So, while it seems wasteful to call display procs when they are not able to
do any drawing, it seems that they are doing other things which Tk depends
upon.  Also, it is not clear that the aborted drawing is hugely expensive.
There are a lot of calls to drawing routines from outside of drawRect: but
they do return immediately if [NSApp isDrawing] is false.  One thing to note
here is that the Text widget display proc actually contains some code that,
on macOS, checks [NSApp isDrawing]. This is done by calling TkpAppIsDrawing,
which always returns YES on the other platforms but checks [NSApp isDrawing]
on macOS.  In particular, this information is internally available to all
generic widgets. It was used to speed up the Text widget, but perhaps it
could be used in other ways, such as making it possible for a scrollbar
to force a redisplay whenever its display proc is called from outside of
drawRect, given that the lower level approach of calling setNeedsDisplayInRect
is not working due to the timing of when idle tasks are run.  Doing extra
redraws of scrollbars could easily be worth the small amount of time it
would cost.

In the course of doing this I learned (or remembered) some things about
how drawRect: works. Here is a summary:

1. While Apple's documentation discusses making drawing more efficient by
calling [NSView getRectsBeingDrawn: count:], that feature appears to not exist
in Catalina.  In several days of experimenting I never once saw the count be
different from 1.  I believe that you get 1 dirty rectangle which is the
smallest rectangle containing all of the dirty rectangles that have been
reported by calling [NSView setNeedsDisplayInRect:].  Or, sometimes, you
get the full bounding rectangle (see 2 below).  I tried making sequences
of calls to [NSView setNeeds displayInRect] using different rectangles
but I was never able to create a count > 1 by doing that.  It is possible
that I had a flaw in my test method, but at any rate it is very unlikely
to get more than one rectangle.

2. Even if the app *never* calls [NSView setNeedsDisplay:YES] or
[NSWindow display] there will still be some calls to drawRect which pass
the entire bounds rectangle as the dirty rect. (I tested this by replacing
all such calls with [NSView setNeedsDisplayInRect:NSMakeRect(50, 50, 50, 50).)

3. When drawRect runs, it first clears the dirty rect.  If the NSView reports
that it is opaque then the dirty rect is filled with opaque black.  Otherwise
it is filled with a transparent color.  Since a Tk windows has one view on
top of a blank window frame filled with the default window background, what
you see with a transparent dirty rect is a rectangle filled with the default
window background.  Apple says that the drawRect: method of an opaque NSView
*must* draw every pixel of the dirty rect.  This is the reason.

4. Apple suggests that if the NSView reports that it does *not* want
defaultClipping (as TKContentView does) then drawing in drawRect is not
clipped to the dirty rect.  This appears to be the case.

These observations imply two things.  First there is not really any
alternative to having drawRect search through the entire Tk hierarchy to
find all widgets that have non-empty intersection with the dirty rect.
They all need to be redrawn.  But, second, more is needed.  The stacking
order needs to be considered.  If you have an opaque frame which covers
part of the dirty rect and part of the outside of the dirty rect and lies
under a button which is completely outside, then drawing the frame
would cover up the button.  The way Tk currently handles this is to
maintain a complex clipping region for each widget, which is the bounding
rectangle of the widget with a cut-out for each widget that overlaps it
and is higher in the stacking order. In the example above the frame's
clipping region clips out the bounding rectangle of the button. The
intent is to make it possible for any widget to be drawn at any time
without considering stacking order. An alternative might be to use default
clipping and to take care to draw widgets from bottom to top.  That
would be a major change, reworking code that seems to go back to the
first cocoa port.

nab added on 2020-05-26 11:39:44:
Hi,
here's my results with the different proposed improvements.
after 1 week of using branch bug-2a61eca3a8 (which implements the temporary loop), I saw several macOS wheel at launching of my app (app is stuck in a loop, mainly because on update idletasks call (I think)).
also, I use tkdnd for drag'n drop operation and I saw some kind of lag while dragging items.

I've applied Christopher's second patch and Marc's code of idle_curiosity branch on top of mac_style branch and results are good.
no macOS wheel on launching, no more canvas items not being redrawn (mainly I think because Christopher's second patch have removed the if ([NSEvent pressedMouseButtons]) {
		[view setNeedsDisplay:YES];} call).

here's my results...
and it's a delight to follow your discussions about Tk's optimisations.

++

chrstphrchvz added on 2020-05-25 15:32:30:

By "temporary" I mean that the loop only runs briefly as opposed to "permanently" (for roughly the entire lifetime of a Tk process) like the main Tcl event loop. Though I may have lifted the phrase "temporary event loop" from somewhere else; and by this reasoning the loops used while in drawRect: would also be considered temporary ones.

This summary of Tcl_DoOneEvent() looks correct; I have not examined it any more closely myself to disagree. It seems fine that "do one event" should not necessarily mean doing exactly one event; a script could have done after idle {update idletasks}, clearly entailing that more than one event be done when handling an event. What is less clear is whether Tcl_DoOneEvent() may return 0 even though at some point it had to recurse and actually handle events, such as in one of the temporary loops in the setup/check procs or in drawRect:; I'm thinking it can, because it's only considering at that topmost "level" whether any events were left even after possibly recursing—but the meaning of returning 0 is then more "there is now no work left to do" rather than "absolutely no work at all was done". (I wondered if this possibly defeats the purpose of adding check to drawRect: to make sure no idle events are queued when it is entered. It shouldn't, since Tcl_DoOneEvent() only loops back around to step 2 unless TCL_DONT_WAIT is set, as implied by TCL_IDLE_EVENTS.)

The time spent generating Expose events probably depends mostly on how many windows there are or how complicated their arrangement is. I've tried timing entire drawRect: calls versus timing the call to GenerateUpdates(). For the example from [2a61eca3a8], which is a more extreme example with about 100 windows (each with relatively simple drawing), GeneratingUpdates() still only takes about 5% of the time spent in drawRect:; for a toplevel with few windows, it may be less than 1%. More time is spent handling the Expose events: about 2 to 5 times more than generating the Expose events, so up to 10% for the example from [2a61eca3a8] but still often less than 1% for toplevels with only a few windows. But the vast majority of time spent in drawRect: is on handling the idle redrawing events: between 85% and 99%.

Keeping track of which windows should redraw may still help, as that would avoid redrawing widgets that are contained by getRectsBeingDrawn: but aren't actually drawable.

An idea I mentioned 2020-03-25 was to see if there could be a way for existing drawing routines to first detect whether drawing is at all possible, and if not, call (or schedule a call to) setNeedsDisplayInRect:, possibly just using an entire widget's bounds if it lacks some intelligent way of knowing where to redraw (which the text widget apparently does have), and return early to avoid wasted work. But even this approach might not work if redrawing routines tend to have other side effects besides putting stuff on the screen. Yet reducing total work may matter more for Tk as far as "CPU spiking" than just trying to move work currently (but unnecessarily) being done in drawRect: to somewhere else, particularly since Tk is single-threaded and does not do background-thread drawing.


marc_culler (claiming to be Marc Culler) added on 2020-05-24 22:22:55:
A comment about "CPU spiking".  I believe this is caused by the algorithm for
deciding which expose events to issue from inside drawRect in order to cause
widgets that touch the damage region to schedule idle tasks to redraw themselves.

The algorithm is, basically, to walk the entire Tk hierarchy and compute the
intersection of each widget with the damage region.  There are some attempts
to prune the hierarchy tree.  And I think that those are based on the invalid
assumption that the Tk hierarchy tree is the same as the containment hierarchy
tree, which is not the case thanks to the -use option to the geometry managers.

The straightforward way to handle this would seem to be that when a widget
needs to be redrawn it posts its display proc on a special queue and that
drawRect simply runs all of the display procs on that special queue.  But I
don't see any way of implementing something like that without creating a
project that would leak out of the macosx port and end up making fundamental
changes to the structure of the generic Tk package.

marc_culler (claiming to be Marc Culler) added on 2020-05-24 22:08:34:
I don't know for certain what could cause recursive calls to drawRect.  One
suspect would be liveResize.  But what I do know is that the guard against them
was added when they were observed to happen, and cause crashes that were
prevented by adding that guard.

There are other things which are known to cause crashes still, and are vaguely
related to this.  Namely, creating a new NSWindow from inside drawRect.  (For
example, attempting to open an error dialog to report that there was an error
in some Tcl code being exec'ed in an idle task as a part of a display proc.)

marc_culler (claiming to be Marc Culler) added on 2020-05-24 21:55:03:
I thought it would be a good idea to try to understand how adding the loops
that you (for reasons I don't understand) call "temporary" affects the Tcl
event loop.  Here is an attempt to summarize what happens in TclDoOneEvent.

The main Tcl event loop simply calls TclDoOneEvent in an infinite loop until
there are no windows left. But event loops can be nested within event loops.
The outermost loop is run with no flags, meaning it should wait and it should
process all events.  It looks to me like this is what happens in one call to
TclDoOneEvent from that outer loop on macOS:

1. If there an asynchronous event, handle it and return 1.
2. If there is a queued event, handle it and return 1.
3. Run the setup procs for all event sources.  (Here drawRect can run,
"borrowing" the Tcl event queue for expose events, which produce idle tasks,
and then running those idle tasks.)
4. Call Tcl_WaitForEvent.  The setup proc will have set the timeout to 0,
so this will "poll" for "system events" and return.  While "polling", however,
drawRect may run. If any events were found, this fact is recorded.
5. Run the check procs for all sources.  Here drawRect could run again.  Also
window events may get queued.
6. If any events were queued as a result of a call to checkProc, handle one
event and return 1.
7. Process all idle tasks (with a caveat -- see below).
8. If anything was found by the poll in step 4, or if any idle tasks were run,
return 1.  (I think this may include running drawRect, but it is hard to say.)
Otherwise go back to step 1, since no events were processed.  (Actually, the
function could return at step 4 without processing any Tcl events; this is
supposedly done to allow Tcl to check if a variable that is being waited on
may have been changed as a result of some external "system event" that was
found by the poll -- maybe even a call to drawRect?)

The caveat about running all idle tasks is this.  If an idle task were to
spawn a copy of itself, the idle task handler would go into an infinite loop.
So each idle task has a "generation" and only tasks of the current or older
generations are run.  "Child tasks" are ignored.

The caveat is relevant, I think, to the listbox issue because I think that the
scrollbar display proc appears as a child of the listbox display proc, so it
would not get run in the same call to TclServiceIdle.  The generation counter
is incremented each time that TclServiceIdle is called. TclServiceIdle also
explicitly deals with the possibility that an idle task runs its own inner
event loop.

Finally, getting back to the "temporary loops", what they are doing is
inserting another step 7 inside of step 3.  I think this means that we
are actually processing 2 generations of idle tasks in one call to
TclDoOneEvent.  Probably no other platform does that and probably the
authors of TclDoOneEvent did not expect that.  One could wonder whether
2 generations is enough.  Why not 10 generations? But I don't see any
clear reason why it should be dangerous. It surely could interact with
regression tests which have expectations about whether an idle task will
be processed after a generated event.  But I am not seeing any dire signs
there yet.

chrstphrchvz added on 2020-05-24 20:34:17:

Besides cacheDisplayInRect: (which I think might not be what Tk should be using anyway—see [685ac30727]), what else leads to recursive drawRect: calls? Admittedly I haven't been around the Tk Cocoa code for very long, nor have used it on older macOS versions for which recursive drawRect: calls are reportedly unavoidable. But I'm inclined to think there might not be anything forcing drawRect: to be called recursively, especially if Apple happens to believe doing so is "wrong" to the point of making recent macOS versions more strict about not allowing it.


chrstphrchvz added on 2020-05-24 16:14:10:

I have attached a patch with some suggested changes for the idle_curiosity branch.

With just the temporary loops, I can still observe some idle events queued when entering drawRect:, including redrawing events which then try to do setNeedsDisplayInRect:. But so far I've only observed this happen when Tk is started (if the temporary loop in TkMacOSXEventsSetupProc() is disabled before the first call to drawRect:), or when certain scripts are sourced using File > Source or doing File > Run Widget Demo; I do not have an example where this leads to something noticeably failing to draw. I haven't observe this happen later during normal use.

As long as no idle events are queued when generating Expose events (i.e. when entering drawRect:), then I think checking needsToDrawRect: is redundant, as the widgets being redrawn will only be those overlapping with the drawRect: call's rectangle(s).

Would it be a good idea to leave behind a few checks/assertions (for things previously mentioned: needsToDrawRect: returning NO, idle events queued upon entering drawRect:, etc.) which can be enabled by TK_MAC_DEBUG_DRAWING? Some of these checks might not have a way of being enabled without influencing program behavior; for example I'm not aware how to check for idle events without handling them (or digging into private Tcl structures).


chrstphrchvz added on 2020-05-23 00:51:45:

I can prevent the empty toplevel from appearing by adding a global variable which is initially false but then set to true by the first drawRect: call, and only entering the temporary idle event loops in tkMacOSXNotify.c when the variable is true.


chrstphrchvz added on 2020-05-20 14:12:26:

Doing tclsh8.6 lib/tk8.6/demos/widget is enough for me to observe the empty toplevel, however it appears regardless of how I launch Tk or with what program. It also doesn't matter which of the three temporary loops are added; as long as at least one of them is present, there is an empty toplevel and a brief wait.


nab added on 2020-05-20 10:18:54:
and also 
<blockquote> I still see the second-long pause at the empty toplevel during launch.</blockquote>

nab added on 2020-05-20 10:06:30:
Hi,
I've tested the new branch with while (Tcl_DoOneEvent(TCL_IDLE_EVENTS|TCL_DONT_WAIT)) {} call.

provide script in [2a61eca3a8] works perfectly

now in real life (aka with my app) while Christopher's patch works perfectly, this branch have the same symptoms as core-8.6-branch or mac_styles branch, some canvas items are not redrawn (because IMHO the event loop is busy)

++

marc_culler (claiming to be Marc Culler) added on 2020-05-20 01:56:08:
Could you please provide a self-contained recipe for creating the 1 second pause?

Thanks.

chrstphrchvz added on 2020-05-20 01:02:21:

Yes, I observe drawRect: being called for both dequeue:YES and NO.

I have started to wonder if some of the fixes I had proposed were becoming redundant. However I have yet to more thoroughly check that opting for just temporary loops or whichever subset of fixes doesn't leave something quietly/not-obviously broken (there will always be something else that's broken, it seems). For example, I would at a minimum want to check whether TkMacOSXSetupDrawingContext() still ever returns true when needsToDrawRect: returns NO; whether setNeedsDisplay:/setNeedsDisplayInRect: are ever called (e.g. from TkMacOSXSetupDrawingContext()) while in drawRect: (i.e. [NSApp isDrawing] returns YES); and whether a temporary event loop added at the beginning of drawRect: before [NSApp setIsDrawing] handles any events*. Each of these would indicate that an issue is still present.

*With the accumulated proposed fixes, I still notice a few idle events occasionally queued when entering drawRect:. Even if the temporary loop at the beginning of drawRect: is still needed, it should be a significant improvement that most other idle events can be handled outside of drawRect: (by one of the other added temporary loops) rather than in it.

When I started investigating this issue, initially I was trying to remove dependence on setNeedsDisplay:YES as done by existing workarounds (e.g. upon mouse events) in favor of using setNeedsDisplayInRect:; I hope that is still something that can be done here. (It could be that the temporary loop fix wouldn't be sufficient without these existing workarounds, though I don't think this is the case.)

And unfortunately even with just the temporary loops I still see the second-long pause at the empty toplevel during launch.

Finally, a fifth place that could potentially call drawRect:: Tcl_Sleep() (tclMacOSXNotify.c), which contains another occurrence of CFRunLoopRunInMode(); I have so far not observed drawRect: called from here.


marc_culler (claiming to be Marc Culler) added on 2020-05-19 20:37:12:
Christopher, I tried following your suggestion (cubed) by adding loops to
process idle events before each call to [NSApp nextEventMatchingMask ...]
(There are 3 such calls.)

That seems to fix the scrollbar issue with the listview.  Non-regression tests
look better (fewer sporadic failures that occur with Development but not
Deployment).  Nicolas' script from [2a61eca3a8] works perfectly.  I do not
see the window hanging for a second after package require Tk. It seems
to be a miracle cure!

Do you know of any of these drawRect related problems that it does not fix?

This seems like an easy fix and a huge improvement that should just be
committed. Also, thanks to your contributions to this ticket we even have an
explanation of why it works.

PS TCL_DONT_WAIT is implied by TCL_IDLE_EVENTS so it has no effect and causes
no harm.

marc_culler (claiming to be Marc Culler) added on 2020-05-19 19:07:47:
Some of the things I said below are wrong.

This what I am seeing:

1. drawRect gets called from within [NSApp nextEventMatchingMask ..] whether
the dequeue option is NO or YES.

2. The checkProc calls [NSApp nextEventMatchingMask ..] with dequeue = YES even
if the previous call with dequeue = NO returned nil.  The call with
dequeue = NO is only there to prevent handling events while in LiveResize.  And
making checkProc return immediately if the first call returns nil causes Wish
to hang.

3. Tcl_WaitForEvent calls CFRunLoopRunInMode() with a timeout generated from
the Tcl block time.  When the timeout expires it collects file events that
have been reported by a separate "notifier thread" and then queues them.  The
block time is set to 0 by setupProc.

4. It is possible that the recursive calls to drawRect only happen on older
versions of macOS.  Recursive calls to setupProc are common and expected.

marc_culler (claiming to be Marc Culler) added on 2020-05-19 16:19:27:
Question: do you ever observe drawRect being called from inside the
call to [NSApp nextEventMatchingMask:...] which has dequeue=NO?  Or does
that only happens with dequeue=YES?

marc_culler (claiming to be Marc Culler) added on 2020-05-19 16:16:20:
The three functions that you identify are functions which pretty explicitly
cause the main thread to wait.  However, [NSApp nextEventMatchingMask:...]
has the option dequeue which can be set to NO to cause it to return immediately,
and that is used to check if the queue is empty.  We never call
[NSApp nextEventMatchingMask:...] when it has an empty queue, which one might
guess would be the only thing that causes a wait.

However, I think you are saying that [NSApp nextEventMatchingMask:...] does not
actually return immediately, even if there is an event ready to be delivered.
Instead it does other business, including setting up a graphics context which
will actually cause drawing to occur and then calling drawRect, if there are
dirty rectangles.  That sounds very plausible.  Except I am puzzled by one
thing. How do you account for recursive calls to drawRect?  They definitely
do occur, since you can observe drawRect bailing out when it determines that
it is being called recursively.

chrstphrchvz added on 2020-05-19 15:23:37:

My impression (which I haven't figured out how to test) is that drawRect is run more or less like an [interrupt] handler. It runs on the main thread, probably with the same stack, but it can be run at any time…

That is an interesting theory, and I probably don't know enough about threading peculiarities on the Mac or the measures taken by Tk to enforce single-threaded operation/make the Mac aware of its thread-unsafety (e.g. if leaving canDrawConcurrently to the default value of NO is sufficient) to say for certain that it does or does not describe what's happening.

But it has prompted me to look more carefully, and I now notice three other places besides TkMacOSXEventSetupProc() from which drawRect: can be called: the two calls to [NSApp nextEventMatchingMask:…] in TkMacOSXEventCheckProc(), and CFRunLoopRunInMode() (which appears as CFRunLoopRunSpecific() in stack traces) in Tcl_WaitForEvent() (tclMacOSXNotify.c). So far I do not observe drawRect: being called from anywhere besides the four spots I've identified, nor it being called as a sort of interrupt handler (in which case I would expect printing a stack trace printed while in drawRect: to reveal that the thread was busy with something else when it was interrupted). For now I'm inclined to think things would be far more broken if drawRect: (or whichever slightly higher method responsible) were being called wherever and whenever.


marc_culler (claiming to be Marc Culler) added on 2020-05-18 14:23:20:
The guard that is in place for mitigating re-entrancy problems is extremely
crude.  We just block recursive calls to drawRect by making drawRect return
immediately if called recursively.  But I never did any careful study of how
that affects the Apple window manager's idea of what the current list of dirty
rectangles is.  I wonder if that may be part of what is happening here --
that the window manager thinks it has drawn rectangles that it hasn't actually
drawn because of the way that drawRect bails out when it detects recursion.
Since you are now maintaining a private list of dirty rectangles maybe that
could be handled in a more sophisticated way.

marc_culler (claiming to be Marc Culler) added on 2020-05-18 14:08:14:
Sorry - I meant "interrupt handler" where I wrote "event handler".  Too bad we
can't edit these comments.

marc_culler (claiming to be Marc Culler) added on 2020-05-18 14:06:17:
One thing to factor into this picture is the way that Apple runs drawRect.
Of course none of this is documented, so whatever understanding of it we
may have comes from reverse engineering.  My impression (which I haven't
figured out how to test) is that drawRect is run more or less like an event
handler.  It runs on the main thread, probably with the same stack, but it
can be run at any time.  Generally it won't run if there are no rects that
need drawing.  But if there are some dirty rects than it can fire at any time.
This means that it can fire while Tcl is executing an idle task to draw
a widget.  And I think it is even possible for the display proc of a widget
to be called while the display proc of that same widget is being executed.
This would seem to require that all display procs be fully reentrant, which
of course no one was expecting when they wrote the drawing code.  Also, it
means that you cannot rely on the order of execution of various parts of the
Tcl event loop.  The Tcl event loop might think it is handling events, not
idle tasks, when drawRect fires and starts executing idletasks that it
generated from the expose events.  How all of this asynchronous re-entry
plays out in real life can be quite complicated.

chrstphrchvz added on 2020-05-18 13:00:19:

Another example of baked-in idle redrawing event assumptions: while investigating [2a61eca3a8], I tried having Tk_CanvasEventuallyRedraw() do DisplayCanvas() immediately instead of as an idle event, and this led to the canvas borders not always being redrawn (appearing black instead). This is because in CanvasEventProc(), for eventPtr->type == Expose, the REDRAW_BORDERS flag is set after Tk_CanvasEventuallyRedraw() is called. So it is assumed that an idle redrawing event for a widget can be queued before the widget's state has been completely updated and is ready for redrawing.


chrstphrchvz added on 2020-05-18 12:52:22:

…it is not entirely clear to me why it would make any difference…

Starting from the Tcl event loop outside of drawRect:: when an idle redrawing event is handled, it will cause setNeedsDisplayInRect: to be called, which then makes display:drawRect: want to be called. The Aqua event(?) this triggers seems to often take precedence over handling of other idle events, including redrawing events that would also call setNeedsDisplayInRect: (though from your description, it sounds that it shouldn't be happening). If those idle redrawing events can instead all be handled before the next drawRect: call, then that will ensure drawRect: will be coalesced.

I now notice that adding a temporary idle event loop to TkMacOSXEventSetupProc() runs into an issue resembling [d1989fb7cf]: an empty toplevel is shown for about 1 second between doing package require Tk and any subsequent statements in a script. This happens even when specifying TCL_DONT_WAIT (which I'm not sure I understand how to use correctly). I would think there might already be a global variable somewhere that could be used to skip this temporary loop until Tk is sufficiently "initialized".


marc_culler (claiming to be Marc Culler) added on 2020-05-17 16:09:57:
Well, I would not want to be the one to try adding hundreds of lines of
platform-specific conditional code to the generic files.  Also, I think you
can safely assume that there are many places in the code where it is assumed
that a drawing command will be run as an idle task and that they would reveal
themselves one by one as bugs.

My initial reaction to your question is yes, I do think it would be safe to
process idle tasks in the setup proc.  But it does change Tcl's event processing
strategy slightly and it is not entirely clear to me why it would make any
difference.  My (admittedly imperfect) understanding of Tcl's strategy is that
the call to setup_proc is made when every existing event that can possibly be
handled has been handled, and Tcl is checking whether any of its event sources
might have something new for it to do before it goes to sleep.  If that is
correct it would mean that all pending idle events already get processed right
before the call to the setup proc.

chrstphrchvz added on 2020-05-17 14:21:47:

Marc, I'm not sure I understand what you believe is entailed by my suggestion to call drawing routines directly; the suggestion should resemble what I offered in my comment dated 2020-03-25 00:07:45, i.e. adding

#ifndef MAC_OSX_TK
    /* existing code calling Tcl_DoWhenIdle() */
#else
    /* redraw immediately */
#endif
around a few existing lines at a time (admittedly in generic code where it has already been said to avoid making things platform-specific). I'm not clear what the big task could be other than potentially applying this approach to a few hundred places in Tk.

The intention of this change was to workaround remaining issues noticeable at least with the listbox and scrollbar example, where the scrollbar would either not redraw half of the time, or more recently would not always redraw strictly simultaneously with the listbox (causing "flickering"). These were issues that if left unaddressed I thought could potentially be considered deal-breaking regressions for any approaches addressing the overall issue of widgets not always redrawing.

Admittedly it is not a well-thought-out change, and it's more clear now that such non-idle event usage of drawing routines can lead to problems. However it might not even be need to be considered if some other way to encourage coalescing of drawRect: and prevent flickering is found.

Currently, I'd to know if it's safe to do while (Tcl_DoOneEvent(TCL_IDLE_EVENTS)) {} in TkMacOSXEventsSetupProc(), just before the call to [NSApp nextEventMatchingMask:…]. From looking at stack traces while in drawRect:, this appears to be the latest point prior to entering any drawRect: call that is still under Tk's control. From briefly trying it out, I haven't noticed anything wrong, yet it greatly reduces (but not eliminates) the likelihood of an already-queued idle event when entering drawRect:, and encourages drawRect: coalescing (preventing flickering).


marc_culler (claiming to be Marc Culler) added on 2020-05-15 17:14:15:
Calling drawing routines directly would be a huge redesign of Tk.  It might
be a big improvement but it would probably also be a big job.  With the current
design (and especially the macOS adaptation) I would be tempted to say that
calling drawing routines directly should be strictly avoided.

chrstphrchvz added on 2020-05-15 16:34:13:

As I've learned from investigating [40ada90762], my suggestion to call widget drawing routines directly instead of as idle events likely needs to be accompanied by usage of Tcl_CancelIdleCall(). Apparently these drawing routines are only intended to be called as idle events, such that whatever queues them is responsible for setting the widget's REDRAW_PENDING and checking it to only ever queue one redrawing event per widget at a time, and that the drawing routine clears the flag when handled. Anything that strays from this and calls drawing routines directly must then also cancel any idle calls for the routine, otherwise if the widget is destroyed, the destroy callback may assume the cleared REDRAW_PENDING flag means it can skip canceling any queued idle redrawing and free the widget data, potentially leading to a crash when that idle redrawing is finally attempted.


chrstphrchvz added on 2020-05-11 11:49:53:

There is an issue in the approach I used to postpone setNeedsDisplayInRect: calls, where a segmentation fault occurs if the call outlives the view it was for. I'll see if I can revise it.


chrstphrchvz added on 2020-05-10 00:30:05:

It turns out [2a61eca3a8] is somewhat related to this issue; it is another form of widgets erroneously being told it is OK to redraw when it is not. It appears to be caused by having incorrectly assumed that if a widget's bounds overlap with a rectangle from getRectsBeingDrawn:count (i.e. needsToDrawRect: returns YES for the widget's bounds) then any drawing the widget does will appear onscreen and the widget's appearance will then be up-to-date. But the rectangle(s) returned by getRectsBeingDrawn:count may contain portions that are not drawable; Expose events and idle redrawing events for widgets in the undrawable area will be handled, but the drawing those widgets do can be discarded, and widgets will be unaware they haven't completely redrawn.


chrstphrchvz added on 2020-05-06 13:08:18:

The issue in Nicolas' script is interesting, but I want to say is a different issue, so I'm opening a new ticket for it: [2a61eca3a8]. I'm not certain the workarounds I've discussed here actually prevent it (if so, unintentionally); but they do appear to help study the issue by eliminating possible causes. In short, the canvases do appear to be getting redrawn (rather than their redrawing getting delayed indefinitely), but with what I have to guess is the wrong background color, despite the canvas background appearing to be properly configured when redrawn.

Still, I have noticed a few other delayed drawing quirks that the workarounds I've provided here prevent, such as stale drawing artifacts due to [efbedd5ff5]. Any widget—not just scrollbars—can currently try to draw when it shouldn't (i.e. outside needsToDrawRect:) or have its redrawing delayed until something else is also drawn or the mouse is clicked; all that seems special about scrollbars is they are used and redrawn with the exact sequence of events needed to reliably exhibit this issue.


nab added on 2020-05-05 16:35:50:
Hi,
here's a script that describe the problem of not every event to being processed:
package require Tk
package require Ttk

ttk::button .b -text push
grid .b -sticky news

for {set x 1} {$x<100} {incr x} {
    canvas .c$x -width 1 -height 1 -bg black
    grid .c$x -sticky news
}

grid columnconfigure . all -weight 1
grid rowconfigure . 1 -weight 1


bind .b <ButtonPress-1> {doColorchange red}
bind .b <ButtonRelease-1> {doColorchange yellow}

proc doColorchange {colour} {
    for {set x 1} {$x<100} {incr x} {
        after $x [list .c$x configure -bg $colour]
     }
}
#eof

if you <ButtonPress-1> on the push button, each canvas goes red but when you <ButtonRelease-1>, not all canvas goes to yellow (at least not all of them).

with the patch of Christopher all canvas always goes to the right colours, but with this patch CPU usage of my app +10~20% 

please note that not all events are processed only when <ButtonRelease-1>
<ButtonPress-1> always change canvas colour.

++
Nicolas

chrstphrchvz added on 2020-05-03 05:16:08:

Continuing to investigate why drawRect: isn't happening after doing setNeedsDisplayInRect:: the problem is not that setNeedsDisplayInRect: is sometimes completely ignored and fails to ever redraw the rectangle given to it; the problem is that Tk is waiting until something else needs to be drawn, at which point drawRect: is called for a region which the dirty rectangle from the earlier setNeedsDisplayInRect: is coalesced with. I think this may point to an issue in the Aqua event source for detecting when there's outstanding drawRect: calls to make before handing control back to the Tcl event loop, but I am not knowledgeable on this area.

Here's a more complete explanation for why the listbox example only redraws the scrollbar every other time it is scrolled (using setNeedsDisplayInRect: when needsToDrawRect: is false):

Start outside of drawRect:.

The listbox is then scrolled the first time, so it needs to redraw. Since this happens outside of drawRect:, setNeedsDisplayInRect: is done, and only for the rectangle containing the listbox. DrawWidget() for the scrollbar is scheduled to do when idle.

drawRect: is then called for the rectangle containing the listbox. The DrawWidget() for the scrollbar is now done, but since the scrollbar is outside of needsToDrawRect:, setNeedsDisplayInRect: is done for the rectangle containing it, and it is not yet redrawn. The listbox itself is then redrawn, and drawRect: finishes.

Nothing else happens automatically, so the scrollbar has yet to redraw.

The listbox is then scrolled a second time, and since this is again outside of drawRect:, setNeedsDisplayInRect: is done for the listbox.

drawRect: is then entered again, but this time for both the rectangle containing the listbox and the scrollbar; both are successfully redrawn, and no more drawRect: calls are needed or requested.

Scrolling the listbox again starts this process over.

Another way to observe this is to scroll the listbox, and when the scrollbar does not redraw, move the mouse over some other widget which redraws when hovered; drawRect: will then be called for a region containing both the hovered widget and the scrollbar.

Using setNeedsDisplay:YES (equivalent to setNeedsDisplayInRect:bounds) instead of setNeedsDisplayInRect:dirtyNS masks the problem, because it makes sure that drawRect: is called for the entire view bounds; then when something else discovers it needs to redraw itself while still in drawRect:, it can do so right away. (Although using setNeedsDisplay:YES means everything in the view is redrawn anyway, because everything in it will receive an Expose event.) So to correct what I said 2020-03-22, setNeedsDisplay:YES might not have any ability (that setNeedsDisplayInRect: lacks) to call drawRect: again before dropping back to the Tcl event loop; rather, setNeedsDisplay:YES just eliminates the need to do so.

I came up with a way of using Tcl_CreateTimerHandler(0, …) to postpone any setNeedsDisplayInRect: usage until no longer inside drawRect:. I found that while this approach ensures drawRect: will at least be called soon after using setNeedsDisplayInRect:, it can take a few frames to do so. For the listbox example, the scrollbar is usually redrawn 0 to 3 frames after the listbox redraws. (The previously mentioned workaround of having TtkRedisplayWidget() do DrawWidget() immediately rather than schedule it as an idle handler also applies here, ensuring the scrollbar and listbox are redrawn simultaneously.) While this approach might not be desirable to actually use, it may be helpful for studying the problem, so I am providing a patch for it.


chrstphrchvz added on 2020-03-25 22:43:45:

I wonder if might be poor form to use setNeedsDisplayInRect: at all while inside drawRect:.

https://developer.apple.com/videos/play/wwdc2013/215/ (4:38, slide 19) says to avoid calling setNeedsDisplayInRect: from drawRect, but only because it's "bad for performance"; it doesn't say that doing so anyways is somehow erroneous (at least in 2013). They also advise not doing layout inside drawReect:. There are already widgets with separate layout and drawing routines, but they currently always execute together (i.e. while in drawRect: on Aqua).


chrstphrchvz added on 2020-03-25 00:07:45:

I've noticed another problem, though: trying to rely on setNeedsDisplayInRect: does not guarantee drawRect: will be called again before handing control back over to the Tcl event loop, unlike setNeedsDisplay:YES. (The effect I observe is that only every other MouseWheel event fails to update the scrollbar when scrolling a canvas or listbox.) I haven't found why this is the case…

I still don't know why this is the case, but I wonder if might be poor form to use setNeedsDisplayInRect: at all while inside drawRect:.

A workaround I've found is for TtkRedisplayWidget() to call DrawWidget() right away rather than schedule it as an idle event, at least on Aqua:

#ifndef MAC_OSX_TK
    if (!(corePtr->flags & REDISPLAY_PENDING)) {
	Tcl_DoWhenIdle(DrawWidget, corePtr);
	corePtr->flags |= REDISPLAY_PENDING;
    }
#else
    DrawWidget(corePtr);
#endif

Then, as long as TtkRedisplayWidget() is called while outside of drawRect:, setNeedsDisplayInRect: will be have to be called, and DrawWidget() is hopefully called again later only while in a drawRect: for the area containing the widget (when handling the generated Expose event).

This might be inefficient, though, depending on how much wasted work is done before reaching setNeedsDisplayInRect:, and how many times DrawWidget() gets called outside of drawRect:. But the same sort of inefficiency occurs when routines try to draw from the wrong drawRect:. Another inefficiency comes from having to find DrawWidget() all over again by using the area being redrawn to find the widget being redrawn, as opposed to directly calling the routine from a queued event. I'm not sure whether it's suddenly more inefficient than using setNeedsDisplay:YES. It could be less inefficient if the widget's REDISPLAY_PENDING flag were set when a widget is unable to draw right away, and cleared only after a widget is actually drawn. But it doesn't look like the widget drawing routines (e.g. ThumbElementDraw()) have access to this flag; the functions involved don't have return values nor report whether they successfully drew the widget.

It seems a more efficient approach for drawing widgets on Aqua would be to separate the routines for drawing from routines for requesting widgets be redrawn, but in a way that is drawRect-aware (i.e. requesting widgets be redrawn uses setNeedsDisplayInRect:). Or, widget drawing routines should check whether they can draw as soon as possible to minimize wasted work, which might be an easier change to implement.

Another approach might be for generateExposeEvents to handle only the idle events created as a result of the Expose events, rather than all idle events (including those for drawing outside of the area of the current drawRect:). However it's not clear to me this can be easily done nor would it actually work.


nab added on 2020-03-23 19:53:35:
Hi Christopher,
maybe I'm wrong but from all the tests I've made, it seems that [view setNeedsDisplayInRect:dirtyNS] is better than [view setNeedsDisplay:YES] as the later trigger all displayed widget to be redrawn.

++

chrstphrchvz added on 2020-03-23 18:21:07:

I now notice how this topic of CPU usage spiking has gone on for many months and multiple tickets without much satisfaction…

I've not done any real benchmarking, but I am skeptical that there is a noticeable difference from CGRectApplyAffineTransform(). This is at worst a small matrix multiplication, but all it does here is equivalent to having only done:

dirtyCG = NSRectToCGRect(dirtyNS);
dirtyCG.origin.y = dirtyNS.size.height - dirtyCG.origin.y;

(Why use CGRectApplyAffineTransform() instead? As a matter of style, I guess; maybe it's a better way of saying "go from Tk coordinates (origin at top left corner) to Cocoa coordinates (origin at bottom left corner)" in one step. Tk could also instead override -isFlipped to return YES for TKContentView, and then not have to convert between Tk coordinates and Cocoa coordinates everywhere.)

needsToDrawRect: might be the more expensive thing I added, depending how many rectangles are being drawn at a time; it's more work than checking whether the mouse is clicked or scrolling. But its use is actually recommended for performance in some cases. Maybe there are other places in the Tk code where it can be used at least to simplify the code and/or ensure its correctness.


nab added on 2020-03-23 06:10:42:
Hi Christopher,
I've tested your patch against my app and CPU is rising again > 100%.
at least it was better for me when the CGRect dirtyCG calculation was after the if (![NSApp isDrawing] || view != [NSView focusView]) clause.

best regards,
Nicolas

chrstphrchvz added on 2020-03-22 09:35:46:

A little more elaboration: I observed that the drawRect: call that tried drawing the scrollbar but failed always had the area containing only the listbox. This drawRect: call was not what queued the scrollbar to be redrawn (i.e. by generateExposeEvents:); it was already queued before drawRect: was called, by the ScrollbarSetCommand() being called by the listbox widget.

So my guess (I have not successfully confirmed) as to why the scrollbar sometimes redrew correctly is that a different drawRect: call with the area of the scrollbar managed to happen before a drawRect: that didn't.


chrstphrchvz added on 2020-03-22 08:43:44:

I've tried figuring out what's actually causing this issue so that there might not have to be any more resorting to setNeedsDisplay:YES.

The issue appears to be that TkMacOSXSetupDrawingContext() is erroneously returning true (i.e. drawing can proceed) when the region being drawn is outside of the area for the current drawRect: call. I couldn't tell whether TkMacOSXSetupDrawingContext() is already passed knowledge of the drawRect area, but it turns out this is what the convenience method needsToDrawRect: is for. The attached patch reorganizes some existing code and makes use of it.

I've noticed another problem, though: trying to rely on setNeedsDisplayInRect: does not guarantee drawRect: will be called again before handing control back over to the Tcl event loop, unlike setNeedsDisplay:YES. (The effect I observe is that only every other MouseWheel event fails to update the scrollbar when scrolling a canvas or listbox.) I haven't found why this is the case; maybe someone else knows. But for now needsToDrawRect: might at least a better way of knowing when to resort to setNeedsDisplay:YES.


kevin_walzer added on 2020-02-25 03:31:25:
Marc, your bug fix branch addresses this issue. I was tinkering with something like this in tkMacOSXDraw.c but couldn't filter the events properly. Your approach works well and is very clean. Please commit at your convenience and thank you.

marc_culler (claiming to be Marc Culler) added on 2020-02-25 02:37:52:
I created a bugfix branch which corrects the symptom.  It simply marks the
window as needing display whanever a mousewheel event arrives.  I know this is
not optimal, because I can see the scrollbar being drawn while the scrollbar
is outside of the clipping region.  But I don't know what the optimal fix
would be.

marc_culler (claiming to be Marc Culler) added on 2020-02-24 19:39:27:
My testing indicates that the scrollbar thumb *is* being drawn when the mousewheel
is adjusted slowly.  However, it is being drawn with a clipping region set to the
bounding box of the listview, so there is no change to the appearance.

I do not know why the clipping region is wrong at this point.

kevin_walzer added on 2020-02-24 12:44:46:
Removing the code from those commits in tkMacOSXDraw.c does not change the observed behavior in this bug report - those changes must have addressed some other issue. Marc, if you have any thoughts about this, they are welcome.

kevin_walzer added on 2020-02-24 12:15:57:
I suspect that this bug was introduced with commits 8448b35c and 24783747 -- some adjustments on redrawing the scrollbar in certain cases. Marc, can you take a look? 

This appears in the canvas demo and the listbox demo - but not the text widget demo, as far as I can tell.

fvogel added on 2020-02-24 07:41:40:
I'm not seeing this issue on Windows.

bll added on 2020-02-24 03:48:29:
Does not appear as a problem on X11.
Reproducable on Mac OS 10.15.3.

Attachments: