Tk Source Code: View Ticket

Ticket UUID:	06d8246baf7d000971b73dc040a874dfa719fbe4
Title:	Mac OS: scrollbar does not update properly when using mousewheel
Type:	Bug	Version:	tip of 8.6 branch
Submitter:	kevin_walzer	Created on:	2020-02-24 03:22:18
Subsystem:	16. [scrollbar]	Assigned To:	marc_culler
Priority:	5 Medium	Severity:	Minor
Status:	Closed	Last Modified:	2020-08-18 15:01:40
Resolution:	Fixed	Closed By:	marc_culler
		Closed on:	2020-08-18 15:01:40
Description:	The mousewheel bindings for the listbox widget do not seem to update the scrollbar smoothly anymore. To test this, run the widget demo and choose the 50-state listbox, and then use the mousewheel to scroll the listbox. The listbox itself scrolls smoothly, but the scrollbar does not update except at major intervals (usually when the listbox is scrolled halfway down, and then completely to the bottom). I've observed this on macOS 10.15.3 with the tip of 8.6 branch. I'm not certain if it's an issue on other platforms. There has been a lot of work done on scrolling recently so it's possible this is a regression/new bug of some sort. Andy Colebourne first reported the bug to me, and I've been able to reproduce it at least on the Mac. Can anyone test this on X11 and Windows, as well?
User Comments:	marc_culler (claiming to be Marc Culler) added on 2020-08-18 15:01:40: This specific problem has been fixed in the current tip of core-8-6-branch or the 8.6.11 release candidate. So I am closing the ticket. chrstphrchvz added on 2020-07-18 14:34:30: A few more review/testing tickets: [95751c509a], [7ebdd17974], [278fdd11d9] chrstphrchvz added on 2020-07-14 10:04:46: Noting here that tickets [0e200f4575] and [94db8de0d8] have been opened regarding testing of idle_curiosity branch, and tcl/a58737f57b for accompanying Tcl macOS_hangs branch); and that issue [aedc9f6fa4]/patch [37a27ddb2d] appear to be related at least to the work done for this ticket's issue. nab added on 2020-06-12 08:56:02: Hi Marc, does idle_curiosity branch leads to somewhere according to you? ++ nab added on 2020-06-08 15:59:46: yup.. I have only one call to update in my code, which I've removed :) marc_culler (claiming to be Marc Culler) added on 2020-06-08 15:55:35: Thanks, Nicolas. I am aware that the changes to the check proc broke the scheme for allowing Tk to handle events while a menu is open. You will notice that the pendulum stops when a menu is open in the stylish_idle_curiosity branch. I will probably have to redesign how that works. By the way, the "official" advice about update is that if you find that you need to call it then you are probably doing something wrong: https://wiki.tcl-lang.org/page/update Obviously calling it should not cause crazy behavior, though. nab added on 2020-06-08 15:19:15: using stylish_mac_styles branch (which deploy a thread for menu in order to let the event loop to perform while menu is posted). I use to withdraw toplevel, build everything and deiconify. if a call to create a toplevel is made using menu (either click on the menuBar or thanks to shortcut) and a pure call to update (not update idletasks) is made before the deiconify call, the toplevel never appears and cpu is going crazy (like in a do-while loop that can never go out) about callgrind: I'll try ++ marc_culler (claiming to be Marc Culler) added on 2020-06-08 14:48:56: @nab: If you can run a profiler like callgrind on your app I would be interested in seeing the results. nab added on 2020-06-08 14:39:08: you're right, an update idletasks instead of the win_modifyDelay call do display all widgets of toplevels. ++ nab added on 2020-06-08 13:52:10: for mac_style heavy load is 35-40% for stylish_idle_curiosity heavy load is 65% marc_culler (claiming to be Marc Culler) added on 2020-06-08 13:37:37: I think they are being drawn again. But the second time they are drawn the foreground color has been set up correctly, because the switch to DarkAqua has been completed, so they no longer have the same color as the background. Do you mean that the CPU usage is 30% higher than the CPU usage for the mac_styles branch or that the total CPU usage is reported as 30%? What is the total CPU usage in both cases? nab added on 2020-06-08 12:58:28: regarding the ttk:labels, they're drawn (they apppear) as soon as I move mouse over them. regarding the 30%, it's compare to mac_styles branch. (same computer, same file loaded in my app, same operations) ++ marc_culler (claiming to be Marc Culler) added on 2020-06-08 12:43:21: Since you are changing the Appearance of your window to DarkAqua, it would make sense that ttk::labels are affected differently from tk::labels. The tk::labels do exactly the same drawing whether they are in Dark Mode or not. The ttk::labels use "semantic colors" which have different RGB values in Light and Dark Mode. Most likely your labels were being drawn, but the letters were being drawn in the same color as the background. I wonder if the switch from LightAqua to DarkAqua is taking place inside of drawRect. That would surely cause something weird to happen. I don't understand what you mean when you say that CPU usage is "+30%". Are you saying 30% more than something else? What is the something else? How am I supposed to calibrate these numbers? I think we are doing less drawing with this branch, so I would expect that to reduce CPU usage. However, I would expect this approach to consistently do all drawing twice - once outside of drawRect when the Tcl event loop is handling idle tasks and then once more inside of drawRect when the special inner event loop is handling the expose events that are generated in drawRect. If the current 8.6 approach is accidentally managing to be more efficient by doing some of its drawing within drawRect the first time, due to the fact that drawing is done at unpredictable times and sometimes not at all, then we may have to choose between fast and unreliable or slow and reliable. I am providing a way for a display procedure to detect whether it is being called from drawRect, so in principle it could do just a minimal amount of work when called the first time and do the real drawing work when called from drawRect. But that would involve rewriting all of the display procedures. It seems better to have a framework in place before doing anything like that. And it seems like a lot of work with many opportunities for making mistakes. nab added on 2020-06-08 08:48:45: more experiments shows that following code which introduce 100ms delay after deiconify seems to give times to Tk to display all widgets of toplevels (including ttk::label ones) proc nsWin_sleep {ms namespace} { namespace upvar $namespace nsWin_sleepV localnsWin_sleepV after $ms {set localnsWin_sleepV 1} vwait localnsWin_sleepV } proc win_modifyDelay {} { nsWin_sleep 100 myNamespace } proc win_deiconifySequence {win} { if {[tk windowingsystem] eq {aqua} && $::tcl_platform(osVersion) >= 18} { #following update idletasks must be in place before MacWindowStyle update idletasks ::tk::unsupported::MacWindowStyle appearance $win darkaqua } wm deiconify $win if {[tk windowingsystem] eq {aqua}} { win_modifyDelay } } ++ nab added on 2020-06-08 08:39:25: I'm seeing the light!! (I'm kidding :) ) this is very good. toplevels are drawn fast, no issues with tablelist, this is also very good because it means that all idle stuff are honored (tablelist is the perfect example of idle drawings). some stuff are still weird though: if I grid/pack a label in toplevels, they're perfectly drawn. If I grid/pack a ttk::label, it's not drawn until I move mouse over the frame/ttk::frame that contains the ttk::label (no matter what is the -style applied to the ttk::label) CPU is high (+30%) with my test file. with the latest script (with <Return> binding to toggle red/yellow canvas) I saw only one time a canvas badly redrawn. It was not because of key repetition. It happens only one time over > 300 keyPress/Release <Return>, and once it happened it never happens again. in my app I'm using tkdnd to provide drag'n drop capabilities. I do see one issue there mainly because (I think) mouse B1 is pressed and in the same time B1-Motion is involved. In my app, I do display some colours for canvas when the dn'd operation is occuring (mouse is over canvas items) and with this branch myCanvas itemconfigure myItem -fill thisColour does not do anything. but the whole direction of this branch seems nice especially if now drawRect() is controlled. ++ Nicolas marc_culler (claiming to be Marc Culler) added on 2020-06-07 22:37:19: With the tip of the macOS_hangs branch of Tcl and the tip of either idle_curiosity or stylish_idle_curiosity of Tk, I am no longer seeing windows that are only partially rendered. So I am thinking this is very close to being finished. Nicolas, what are you seeing with these versions? marc_culler (claiming to be Marc Culler) added on 2020-06-06 22:46:43: Yes, I am also seeing refresh issues where widgets do not get drawn until there is a mouse event. So this is not finished. But I am out of ideas at this point. Maybe someone else will have one. I do think that when the redraw fails, Tcl is blocking in Tcl_WaitForEvent. I am not sure whether it has anything to do with Tk at all, actually. nab added on 2020-06-06 22:07:17: ok, I did compile the TCL branch + this Tk branch. when widgets are not drawn, the background of the toplevel is visible. Before there where a black square where widgets should be drawn. with the new tcl branch + this Tk branch widgets are still not redrawn. I tried update/update idletasks calls after toplevel deiconify, but no success. marc_culler (claiming to be Marc Culler) added on 2020-06-06 21:28:25: I have started a Tcl branch called macOS_hangs which attempts to address the problem that macOS Tk sometimes hangs in Tcl_WaitForEvent until you move the mouse over a window. I would suggest building Tcl from that branch before testing the branches for this ticket. marc_culler (claiming to be Marc Culler) added on 2020-06-06 21:06:30: On second thought, the incomplete window issue looks more problematic. I think some of the things I did to fix that may have made it worse. marc_culler (claiming to be Marc Culler) added on 2020-06-06 20:32:20: @nab: Two things - 1. That commit is intended to go with a change to Tcl which I haven't pushed yet. 2. The problem with new windows not being completely drawn is something I have seen too. I don't know if it is really related to this, although it could be. However, I have found that you can work around this by adding some calls to update idletasks. Sometimes you might have to test whether a part of the window has been mapped and if not run wait visibility before running update idletasks. nab added on 2020-06-06 19:11:03: Hi Marc, with your latest commit I still do see toplevels not entirely redrawn. ++ nab added on 2020-06-06 07:53:54: Hi Marc, well done with the script, it now passes here too (with the exception of keyRepeat as you've noticed). here's a link to a video of how my app react with this branch: https://vimeo.com/426478616 each time I keyPress on the Flash button, little square above numbers 1, 2, 3, 4, 5 should go yellow. Each time I keyRelease, they should go black. With this branch they don't... At 10 sec of the video I open a toplevel that display buttons/labels on the top and a tablelist, but toplevel is partially drawn and button do appears only when I move mouse above them. I initially thought it was because at toplevel creation I call this code: update idletasks ::tk::unsupported::MacWindowStyle appearance $win darkaqua but no... removing the update idletasks still make the toplevel to be drawn partially. hope this can help. ++ marc_culler (claiming to be Marc Culler) added on 2020-06-05 18:41:16: Thanks, Kevin. I'll take a look. Nicolas, I think I am close now. Your 500 canvas script almost works correctly now, and I think the problem when it doesn't has to do with something else. The script works fine with the mouse and it works fine with the return key except if you hold the return key down until it starts repeating. When a key starts repeating on the mac, the system does not send press and release events repeatedly like X11 does. It sends press events with the repeat flag set. So the mac port of Tk artificially creates a KeyRelease XEvent and sends it right after the KeyPress XEvent. I think something is a bit wrong with that mechanism. My guess at the moment is that both events have the same timestamp. So when the return key starts repeating you are creating two timer events for the same time. One asks the canvas to turn red and the other asks it to turn yellow and it is supposed to do both things at the exact same time. So what actually happens is that some canvasses turn red and some turn yellow in some more or less random pattern. This may be an issue, but not necessarily related to the problem in this ticket. kevin_walzer added on 2020-06-05 17:13:52: I added that call years ago, in 2015 I think - it dates back to the days when Tk was still using all those private Cocoa API's and we had zero control of the event loop - stuff would hang randomly. It was essentially a hack that brought considerable improvement. Perhaps it's no longer needed. If you check the file on the Fossil repo and check "blame," you will see the commit in which I put it, and clicking on the commit link will bring up a diff of the changes. Perhaps that might be helpful to look at, Marc. nab added on 2020-06-05 16:37:21: I hope you'll find the reason why... ++ marc_culler (claiming to be Marc Culler) added on 2020-06-05 15:35:54: Thanks, Nicolas. The key question that I need to answer before I can go any further is: why does Tcl_WaitForEvent sometimes hang? It does not seem to be a new problem. The implementation of Tcl_WaitForEvent in macOSX/tclMacOSXNotifier.c has this comment: /* This call seems to simply force event processing through and prevents hangups that have long been observed with Tk-Cocoa. / But now that we are not calling drawRect willy-nilly anymore the hangs have become a bigger problem. nab added on 2020-06-04 08:11:44: HI Marc, thank you for the merge in mac_style. so, using stylish_idle_curiosity when tablelist is involved I can see a some glitches (I've asked Csaba if he can have a look). also using following script it can still happen that some items of canvas are not redrawn (you have more chances to get it wrong with KeyPress/Release Return). I also saw one thing but it does happen very rarely: -when an item is not redrawn I had to move the mouse over the canvas (without clic) for the item to be redrawn in the script I use -container option of frame in order to display canvas in a toplevel (I did that because I use -container in my app) but not using this mechanism is the same, some items are not always redrawn. also, compiling Tk in deploy or develop mode produce different glitches. in develop mode, I often need to move mouse over widgets for them to be displayed/updated in terms of CPU usage, it's high... but it's acceptable as long as all calls for widget updates are honored. all the best, nicolas package require Tk package require Ttk ttk::button .b -text push set fc [frame .f -container 1] set master [toplevel .n -use [winfo id $fc]] canvas $master.c for {set x 1} {$x<500} {incr x 5} { $master.c create rectangle 0 $x 100 [expr {$x + 5}] -fill black -tags tag$x } grid .b -sticky news grid $fc -sticky news grid $master.c -sticky news grid columnconfigure $master all -weight 1 grid rowconfigure $master all -weight 1 grid columnconfigure . all -weight 1 grid rowconfigure . 1 -weight 1 bind .b <ButtonPress-1> {doColorchange red} bind .b <ButtonRelease-1> {doColorchange yellow} bind . <KeyPress-Return> {doColorchange red} bind . <KeyRelease-Return> {doColorchange yellow} proc doColorchange {colour} { global master for {set x 1} {$x<500} {incr x} { after $x [list $master.c itemconfigure tag$x -fill $colour] } } marc_culler (claiming to be Marc Culler) added on 2020-06-03 17:48:47: OK. I merged mac_styles into idle_curiosity. That involved resolving one small conflict. The new branch is called stylish_idle_curiosity. nab added on 2020-06-03 16:57:29: HI Marc, would it be possible to create a branch offset of mac_styles branch with idle_curiosity branch? I'm very very very used to use mac_styles branch and I know how my app react. thank you, Nicolas marc_culler (claiming to be Marc Culler) added on 2020-06-03 13:46:33: The idle_curiosity branch is now ready for testing. The three scripts posted inline here (and now attached) all work. The scrollbars in the "states" demo now work correctly. All regression tests now pass consistently (as opposed to having a few sporadic failures, varying between deployment and development branches and between different runs). Some tests had special-cased different "correct" results for macOS; now macOS gets the same results as other platforms. Here is a summary of the main changes. 1. The actual low level cause of the drawing failures revealed by Nicolas' script was that the drawing of some of the canvas widgets was taking place outside of the clipping rectangle which AppKit adds to the graphics context provided to drawRect. When drawRect runs it always clips to the rectangle passed as its rect parameter. (Yes, this contradicts Apple's documentation.) The function TkMacOSXSetupDrawingContext has been modified to check for this and set canDraw to false when the clipping region of the TK drawing context is not contained in the clipping region of the current CoreGraphics context. 2. Since Apple (at least on Catalina) now always provides only a single dirty rectangle to drawRect, code for dealing with regions given by a list of rectangles has been simplified. 3. Christopher observed that drawRect is not run in a mysterious asynchronous way, but rather (sometimes) gets run when the Tcl event loop explicitly releases control, either by fetching NSEvents from the AppKit queue or by starting a new runloop. I realized that this means that we can actually control when it runs. I have given each NSView its own tkDirtyRect rectangle and tkNeedsDisplay flag. When a widget needs display these are updated. But the AppKit needsDisplay flag is never set outside of one new function TkMacOSXDrawAllViews, which gives AppKit an opportunity to call drawRect with the tkDirtyRect rectangle and then sets the needsDisplay flag back to NO if AppKit declines to run drawRect. (Appkit accumulates the dirty rectangles according to some private algorithm. In Nicolas' script, for example, the entire window with 99 canvases becoming dirty at different times, drawRect runs 4 or 5 times.) 4. Now that we can control when drawing happens, we can do it when Tk expects it to be done - namely after all available events have been added to the Tk queue and there is nothing else to do. This happens in the TkMacOSXEventsCheckProc. When it returns without adding any events, the Tcl_DoOneEvent function would normally proceed to run all idle tasks. Just before that happens we add one new idle task to the idle queue, which calls TkMacOSXDrawAllViews. That way the drawing happens after all of the pending idle tasks that might reconfigure a widget. 5. After concentrating the drawing in one place, issues with Tcl_WaitForEvent hanging were revealed. These had been partly masked by the random places in whichvdrawRect was being called. But an attempt had been made to deal with these by asking AppKit to issue periodic events. This only partly worked and for accidental reasons. In fact, if Tcl_WaitForEvent has decided to wait instead of poll for events the only thing (outside of asynchronous file events) which will wake it up is a timer event. So I replaced the NSPeriodicEvents with no-op timer events which actually cause Tcl_WaitForEvent to return. (These only run when there are no user events being generated.) 6. I have added a hook TkpWillDrawWidget which can be used by a display proc to find out whether it is being called from drawRect and also to inform the NSView of which widget is being drawn at the moment. This is only used by the test package and in tkFont to avoid redefining all of the fonts while drawRect is running, which causes crashes. marc_culler (claiming to be Marc Culler) added on 2020-06-02 20:48:49: Sure, please test the idle_curiosity branch. I know of some glitches which I may or may not be able to fix. You will probably find those. And I expect to be making another commit fairly soon. But mostly it seems to be working and to be an improvement. The changes are fairly complicated. I will post an explanation later. nab added on 2020-06-02 13:01:11: Hi Marc, does the idle_curiosity branch is ready for testing? best regards, nicolas marc_culler (claiming to be Marc Culler) added on 2020-05-31 02:57:57: Further testing reveals that on Catalina the [NSView wantsDefaultClipping] getter is never called, and [NSView drawRect:] always clips to the rectangle that is passed to drawRect. Also, while it appears to be the case that a call to [NSApp nextEventMatchingMask: ...] may result in drawRect being called before the function returns, having NSViews with non-empty dirty rectangles does not guarantee that it will be run. Appkit will accumulate changes to the dirty rectangle for a while before calling drawRect. I don't know if it is waiting for a certain amount of time to pass or for a large enough damage region or some combination of those, possibly with other factors. marc_culler (claiming to be Marc Culler) added on 2020-05-28 15:02:08: I tested the idea of having a separate queue from the idle queue to hold widget display tasks and having drawRect call of the display procs on the display queue. This required a global change in the generic code that looks like: Tcl_DoWhenIdle(DisplayX, XPtr) -> Tk_QueueDisplayTask(XPtr->tkwin, DisplayX, Xptr) Tcl_CancelIdleCall(DisplayX, XPtr) -> Tk_CancelDisplayTask(XPtr->tkwin, DisplayX, XPtr) Passing the tkwin allows Tk to figure out which NSView is involved in the display operation. Each NSView has its own queue of pending display tasks. Basically, what happened is that many, many things broke. This even includes segfaults in the Tcl byte code handler. My conclusions from tracking down some of the failures are: 1. There are many, many parts of the Tk code that rely on side effects of running a display proc. (Note, however, that this ticket is about a case where we do not* want the side effect. We do not want the scrollbar to be marked as having been displayed when in fact it was not displayed because its display proc was called outside of drawRect, and then it was also not drawn when the listview was drawn because it lies outside of the listview bounding rectangle, which was the dirty rect when drawRect was drawing the listview.) 2. There are also many parts of the Tk code that make assumptions about when an idle task will be run compared to when an event driven task will be run. 3. It is common for display tasks to be called before a Tk Window has been constructed, i.e. before its underlying NSView exists. And the side effects of running those display procs are, evidently, crucial. So, while it seems wasteful to call display procs when they are not able to do any drawing, it seems that they are doing other things which Tk depends upon. Also, it is not clear that the aborted drawing is hugely expensive. There are a lot of calls to drawing routines from outside of drawRect: but they do return immediately if [NSApp isDrawing] is false. One thing to note here is that the Text widget display proc actually contains some code that, on macOS, checks [NSApp isDrawing]. This is done by calling TkpAppIsDrawing, which always returns YES on the other platforms but checks [NSApp isDrawing] on macOS. In particular, this information is internally available to all generic widgets. It was used to speed up the Text widget, but perhaps it could be used in other ways, such as making it possible for a scrollbar to force a redisplay whenever its display proc is called from outside of drawRect, given that the lower level approach of calling setNeedsDisplayInRect is not working due to the timing of when idle tasks are run. Doing extra redraws of scrollbars could easily be worth the small amount of time it would cost. In the course of doing this I learned (or remembered) some things about how drawRect: works. Here is a summary: 1. While Apple's documentation discusses making drawing more efficient by calling [NSView getRectsBeingDrawn: count:], that feature appears to not exist in Catalina. In several days of experimenting I never once saw the count be different from 1. I believe that you get 1 dirty rectangle which is the smallest rectangle containing all of the dirty rectangles that have been reported by calling [NSView setNeedsDisplayInRect:]. Or, sometimes, you get the full bounding rectangle (see 2 below). I tried making sequences of calls to [NSView setNeeds displayInRect] using different rectangles but I was never able to create a count > 1 by doing that. It is possible that I had a flaw in my test method, but at any rate it is very unlikely to get more than one rectangle. 2. Even if the app never calls [NSView setNeedsDisplay:YES] or [NSWindow display] there will still be some calls to drawRect which pass the entire bounds rectangle as the dirty rect. (I tested this by replacing all such calls with [NSView setNeedsDisplayInRect:NSMakeRect(50, 50, 50, 50).) 3. When drawRect runs, it first clears the dirty rect. If the NSView reports that it is opaque then the dirty rect is filled with opaque black. Otherwise it is filled with a transparent color. Since a Tk windows has one view on top of a blank window frame filled with the default window background, what you see with a transparent dirty rect is a rectangle filled with the default window background. Apple says that the drawRect: method of an opaque NSView must draw every pixel of the dirty rect. This is the reason. 4. Apple suggests that if the NSView reports that it does not want defaultClipping (as TKContentView does) then drawing in drawRect is not clipped to the dirty rect. This appears to be the case. These observations imply two things. First there is not really any alternative to having drawRect search through the entire Tk hierarchy to find all widgets that have non-empty intersection with the dirty rect. They all need to be redrawn. But, second, more is needed. The stacking order needs to be considered. If you have an opaque frame which covers part of the dirty rect and part of the outside of the dirty rect and lies under a button which is completely outside, then drawing the frame would cover up the button. The way Tk currently handles this is to maintain a complex clipping region for each widget, which is the bounding rectangle of the widget with a cut-out for each widget that overlaps it and is higher in the stacking order. In the example above the frame's clipping region clips out the bounding rectangle of the button. The intent is to make it possible for any widget to be drawn at any time without considering stacking order. An alternative might be to use default clipping and to take care to draw widgets from bottom to top. That would be a major change, reworking code that seems to go back to the first cocoa port. nab added on 2020-05-26 11:39:44: Hi, here's my results with the different proposed improvements. after 1 week of using branch bug-2a61eca3a8 (which implements the temporary loop), I saw several macOS wheel at launching of my app (app is stuck in a loop, mainly because on update idletasks call (I think)). also, I use tkdnd for drag'n drop operation and I saw some kind of lag while dragging items. I've applied Christopher's second patch and Marc's code of idle_curiosity branch on top of mac_style branch and results are good. no macOS wheel on launching, no more canvas items not being redrawn (mainly I think because Christopher's second patch have removed the if ([NSEvent pressedMouseButtons]) { [view setNeedsDisplay:YES];} call). here's my results... and it's a delight to follow your discussions about Tk's optimisations. ++ chrstphrchvz added on 2020-05-25 15:32:30: By "temporary" I mean that the loop only runs briefly as opposed to "permanently" (for roughly the entire lifetime of a Tk process) like the main Tcl event loop. Though I may have lifted the phrase "temporary event loop" from somewhere else; and by this reasoning the loops used while in `drawRect:` would also be considered temporary ones. This summary of `Tcl_DoOneEvent()` looks correct; I have not examined it any more closely myself to disagree. It seems fine that "do one event" should not necessarily mean doing exactly one event; a script could have done `after idle {update idletasks}`, clearly entailing that more than one event be done when handling an event. What is less clear is whether `Tcl_DoOneEvent()` may return 0 even though at some point it had to recurse and actually handle events, such as in one of the temporary loops in the setup/check procs or in `drawRect:`; I'm thinking it can, because it's only considering at that topmost "level" whether any events were left even after possibly recursing—but the meaning of returning 0 is then more "there is now no work left to do" rather than "absolutely no work at all was done". (I wondered if this possibly defeats the purpose of adding check to `drawRect:` to make sure no idle events are queued when it is entered. It shouldn't, since `Tcl_DoOneEvent()` only loops back around to step 2 unless `TCL_DONT_WAIT` is set, as implied by `TCL_IDLE_EVENTS`.) The time spent generating Expose events probably depends mostly on how many windows there are or how complicated their arrangement is. I've tried timing entire `drawRect:` calls versus timing the call to `GenerateUpdates()`. For the example from [2a61eca3a8], which is a more extreme example with about 100 windows (each with relatively simple drawing), `GeneratingUpdates()` still only takes about 5% of the time spent in `drawRect:`; for a toplevel with few windows, it may be less than 1%. More time is spent handling the Expose events: about 2 to 5 times more than generating the Expose events, so up to 10% for the example from [2a61eca3a8] but still often less than 1% for toplevels with only a few windows. But the vast majority of time spent in `drawRect:` is on handling the idle redrawing events: between 85% and 99%. Keeping track of which windows should redraw may still help, as that would avoid redrawing widgets that are contained by `getRectsBeingDrawn:` but aren't actually drawable. An idea I mentioned 2020-03-25 was to see if there could be a way for existing drawing routines to first detect whether drawing is at all possible, and if not, call (or schedule a call to) `setNeedsDisplayInRect:`, possibly just using an entire widget's bounds if it lacks some intelligent way of knowing where to redraw (which the text widget apparently does have), and return early to avoid wasted work. But even this approach might not work if redrawing routines tend to have other side effects besides putting stuff on the screen. Yet reducing total work may matter more for Tk as far as "CPU spiking" than just trying to move work currently (but unnecessarily) being done in `drawRect:` to somewhere else, particularly since Tk is single-threaded and does not do background-thread drawing. marc_culler (claiming to be Marc Culler) added on 2020-05-24 22:22:55: A comment about "CPU spiking". I believe this is caused by the algorithm for deciding which expose events to issue from inside drawRect in order to cause widgets that touch the damage region to schedule idle tasks to redraw themselves. The algorithm is, basically, to walk the entire Tk hierarchy and compute the intersection of each widget with the damage region. There are some attempts to prune the hierarchy tree. And I think that those are based on the invalid assumption that the Tk hierarchy tree is the same as the containment hierarchy tree, which is not the case thanks to the -use option to the geometry managers. The straightforward way to handle this would seem to be that when a widget needs to be redrawn it posts its display proc on a special queue and that drawRect simply runs all of the display procs on that special queue. But I don't see any way of implementing something like that without creating a project that would leak out of the macosx port and end up making fundamental changes to the structure of the generic Tk package. marc_culler (claiming to be Marc Culler) added on 2020-05-24 22:08:34: I don't know for certain what could cause recursive calls to drawRect. One suspect would be liveResize. But what I do know is that the guard against them was added when they were observed to happen, and cause crashes that were prevented by adding that guard. There are other things which are known to cause crashes still, and are vaguely related to this. Namely, creating a new NSWindow from inside drawRect. (For example, attempting to open an error dialog to report that there was an error in some Tcl code being exec'ed in an idle task as a part of a display proc.) marc_culler (claiming to be Marc Culler) added on 2020-05-24 21:55:03: I thought it would be a good idea to try to understand how adding the loops that you (for reasons I don't understand) call "temporary" affects the Tcl event loop. Here is an attempt to summarize what happens in TclDoOneEvent. The main Tcl event loop simply calls TclDoOneEvent in an infinite loop until there are no windows left. But event loops can be nested within event loops. The outermost loop is run with no flags, meaning it should wait and it should process all events. It looks to me like this is what happens in one call to TclDoOneEvent from that outer loop on macOS: 1. If there an asynchronous event, handle it and return 1. 2. If there is a queued event, handle it and return 1. 3. Run the setup procs for all event sources. (Here drawRect can run, "borrowing" the Tcl event queue for expose events, which produce idle tasks, and then running those idle tasks.) 4. Call Tcl_WaitForEvent. The setup proc will have set the timeout to 0, so this will "poll" for "system events" and return. While "polling", however, drawRect may run. If any events were found, this fact is recorded. 5. Run the check procs for all sources. Here drawRect could run again. Also window events may get queued. 6. If any events were queued as a result of a call to checkProc, handle one event and return 1. 7. Process all idle tasks (with a caveat -- see below). 8. If anything was found by the poll in step 4, or if any idle tasks were run, return 1. (I think this may include running drawRect, but it is hard to say.) Otherwise go back to step 1, since no events were processed. (Actually, the function could return at step 4 without processing any Tcl events; this is supposedly done to allow Tcl to check if a variable that is being waited on may have been changed as a result of some external "system event" that was found by the poll -- maybe even a call to drawRect?) The caveat about running all idle tasks is this. If an idle task were to spawn a copy of itself, the idle task handler would go into an infinite loop. So each idle task has a "generation" and only tasks of the current or older generations are run. "Child tasks" are ignored. The caveat is relevant, I think, to the listbox issue because I think that the scrollbar display proc appears as a child of the listbox display proc, so it would not get run in the same call to TclServiceIdle. The generation counter is incremented each time that TclServiceIdle is called. TclServiceIdle also explicitly deals with the possibility that an idle task runs its own inner event loop. Finally, getting back to the "temporary loops", what they are doing is inserting another step 7 inside of step 3. I think this means that we are actually processing 2 generations of idle tasks in one call to TclDoOneEvent. Probably no other platform does that and probably the authors of TclDoOneEvent did not expect that. One could wonder whether 2 generations is enough. Why not 10 generations? But I don't see any clear reason why it should be dangerous. It surely could interact with regression tests which have expectations about whether an idle task will be processed after a generated event. But I am not seeing any dire signs there yet. chrstphrchvz added on 2020-05-24 20:34:17: Besides `cacheDisplayInRect:` (which I think might not be what Tk should be using anyway—see [685ac30727]), what else leads to recursive `drawRect:` calls? Admittedly I haven't been around the Tk Cocoa code for very long, nor have used it on older macOS versions for which recursive `drawRect:` calls are reportedly unavoidable. But I'm inclined to think there might not be anything forcing `drawRect:` to be called recursively, especially if Apple happens to believe doing so is "wrong" to the point of making recent macOS versions more strict about not allowing it. chrstphrchvz added on 2020-05-24 16:14:10: I have attached a patch with some suggested changes for the idle_curiosity branch. With just the temporary loops, I can still observe some idle events queued when entering `drawRect:`, including redrawing events which then try to do `setNeedsDisplayInRect:`. But so far I've only observed this happen when Tk is started (if the temporary loop in `TkMacOSXEventsSetupProc()` is disabled before the first call to `drawRect:`), or when certain scripts are sourced using File > Source or doing File > Run Widget Demo; I do not have an example where this leads to something noticeably failing to draw. I haven't observe this happen later during normal use. As long as no idle events are queued when generating Expose events (i.e. when entering `drawRect:`), then I think checking `needsToDrawRect:` is redundant, as the widgets being redrawn will only be those overlapping with the drawRect: call's rectangle(s). Would it be a good idea to leave behind a few checks/assertions (for things previously mentioned: `needsToDrawRect:` returning `NO`, idle events queued upon entering `drawRect:`, etc.) which can be enabled by `TK_MAC_DEBUG_DRAWING`? Some of these checks might not have a way of being enabled without influencing program behavior; for example I'm not aware how to check for idle events without handling them (or digging into private Tcl structures). chrstphrchvz added on 2020-05-23 00:51:45: I can prevent the empty toplevel from appearing by adding a global variable which is initially `false` but then set to `true` by the first `drawRect:` call, and only entering the temporary idle event loops in tkMacOSXNotify.c when the variable is `true`. chrstphrchvz added on 2020-05-20 14:12:26: Doing `tclsh8.6 lib/tk8.6/demos/widget` is enough for me to observe the empty toplevel, however it appears regardless of how I launch Tk or with what program. It also doesn't matter which of the three temporary loops are added; as long as at least one of them is present, there is an empty toplevel and a brief wait. nab added on 2020-05-20 10:18:54: and also <blockquote> I still see the second-long pause at the empty toplevel during launch.</blockquote> nab added on 2020-05-20 10:06:30: Hi, I've tested the new branch with while (Tcl_DoOneEvent(TCL_IDLE_EVENTS\|TCL_DONT_WAIT)) {} call. provide script in [2a61eca3a8] works perfectly now in real life (aka with my app) while Christopher's patch works perfectly, this branch have the same symptoms as core-8.6-branch or mac_styles branch, some canvas items are not redrawn (because IMHO the event loop is busy) ++ marc_culler (claiming to be Marc Culler) added on 2020-05-20 01:56:08: Could you please provide a self-contained recipe for creating the 1 second pause? Thanks. chrstphrchvz added on 2020-05-20 01:02:21: Yes, I observe `drawRect:` being called for both `dequeue:YES` and `NO`. I have started to wonder if some of the fixes I had proposed were becoming redundant. However I have yet to more thoroughly check that opting for just temporary loops or whichever subset of fixes doesn't leave something quietly/not-obviously broken (there will always be something else that's broken, it seems). For example, I would at a minimum want to check whether `TkMacOSXSetupDrawingContext()` still ever returns true when `needsToDrawRect:` returns `NO`; whether `setNeedsDisplay:`/`setNeedsDisplayInRect:` are ever called (e.g. from `TkMacOSXSetupDrawingContext()`) while in `drawRect:` (i.e. `[NSApp isDrawing]` returns `YES`); and whether a temporary event loop added at the beginning of `drawRect:` before `[NSApp setIsDrawing]` handles any events. Each of these would indicate that an issue is still present. With the accumulated proposed fixes, I still notice a few idle events occasionally queued when entering `drawRect:`. Even if the temporary loop at the beginning of `drawRect:` is still needed, it should be a significant improvement that most other idle events can be handled outside of `drawRect:` (by one of the other added temporary loops) rather than in it. When I started investigating this issue, initially I was trying to remove dependence on `setNeedsDisplay:YES` as done by existing workarounds (e.g. upon mouse events) in favor of using `setNeedsDisplayInRect:`; I hope that is still something that can be done here. (It could be that the temporary loop fix wouldn't be sufficient without these existing workarounds, though I don't think this is the case.) And unfortunately even with just the temporary loops I still see the second-long pause at the empty toplevel during launch. Finally, a fifth place that could potentially call `drawRect:`: `Tcl_Sleep()` (tclMacOSXNotify.c), which contains another occurrence of `CFRunLoopRunInMode()`; I have so far not observed `drawRect:` called from here. marc_culler (claiming to be Marc Culler) added on 2020-05-19 20:37:12: Christopher, I tried following your suggestion (cubed) by adding loops to process idle events before each call to [NSApp nextEventMatchingMask ...] (There are 3 such calls.) That seems to fix the scrollbar issue with the listview. Non-regression tests look better (fewer sporadic failures that occur with Development but not Deployment). Nicolas' script from [2a61eca3a8] works perfectly. I do not see the window hanging for a second after package require Tk. It seems to be a miracle cure! Do you know of any of these drawRect related problems that it does not fix? This seems like an easy fix and a huge improvement that should just be committed. Also, thanks to your contributions to this ticket we even have an explanation of why it works. PS TCL_DONT_WAIT is implied by TCL_IDLE_EVENTS so it has no effect and causes no harm. marc_culler (claiming to be Marc Culler) added on 2020-05-19 19:07:47: Some of the things I said below are wrong. This what I am seeing: 1. drawRect gets called from within [NSApp nextEventMatchingMask ..] whether the dequeue option is NO or YES. 2. The checkProc calls [NSApp nextEventMatchingMask ..] with dequeue = YES even if the previous call with dequeue = NO returned nil. The call with dequeue = NO is only there to prevent handling events while in LiveResize. And making checkProc return immediately if the first call returns nil causes Wish to hang. 3. Tcl_WaitForEvent calls CFRunLoopRunInMode() with a timeout generated from the Tcl block time. When the timeout expires it collects file events that have been reported by a separate "notifier thread" and then queues them. The block time is set to 0 by setupProc. 4. It is possible that the recursive calls to drawRect only happen on older versions of macOS. Recursive calls to setupProc are common and expected. marc_culler (claiming to be Marc Culler) added on 2020-05-19 16:19:27: Question: do you ever observe drawRect being called from inside the call to [NSApp nextEventMatchingMask:...] which has dequeue=NO? Or does that only happens with dequeue=YES? marc_culler (claiming to be Marc Culler) added on 2020-05-19 16:16:20: The three functions that you identify are functions which pretty explicitly cause the main thread to wait. However, [NSApp nextEventMatchingMask:...] has the option dequeue which can be set to NO to cause it to return immediately, and that is used to check if the queue is empty. We never call [NSApp nextEventMatchingMask:...] when it has an empty queue, which one might guess would be the only thing that causes a wait. However, I think you are saying that [NSApp nextEventMatchingMask:...] does not actually return immediately, even if there is an event ready to be delivered. Instead it does other business, including setting up a graphics context which will actually cause drawing to occur and then calling drawRect, if there are dirty rectangles. That sounds very plausible. Except I am puzzled by one thing. How do you account for recursive calls to drawRect? They definitely do occur, since you can observe drawRect bailing out when it determines that it is being called recursively. chrstphrchvz added on 2020-05-19 15:23:37: My impression (which I haven't figured out how to test) is that drawRect is run more or less like an [interrupt] handler. It runs on the main thread, probably with the same stack, but it can be run at any time… That is an interesting theory, and I probably don't know enough about threading peculiarities on the Mac or the measures taken by Tk to enforce single-threaded operation/make the Mac aware of its thread-unsafety (e.g. if leaving `canDrawConcurrently` to the default value of `NO` is sufficient) to say for certain that it does or does not describe what's happening. But it has prompted me to look more carefully, and I now notice three other places besides `TkMacOSXEventSetupProc()` from which `drawRect:` can be called: the two calls to `[NSApp nextEventMatchingMask:…]` in `TkMacOSXEventCheckProc()`, and `CFRunLoopRunInMode()` (which appears as `CFRunLoopRunSpecific()` in stack traces) in `Tcl_WaitForEvent()` (tclMacOSXNotify.c). So far I do not observe `drawRect:` being called from anywhere besides the four spots I've identified, nor it being called as a sort of interrupt handler (in which case I would expect printing a stack trace printed while in `drawRect:` to reveal that the thread was busy with something else when it was interrupted). For now I'm inclined to think things would be far more broken if `drawRect:` (or whichever slightly higher method responsible) were being called wherever and whenever. marc_culler (claiming to be Marc Culler) added on 2020-05-18 14:23:20: The guard that is in place for mitigating re-entrancy problems is extremely crude. We just block recursive calls to drawRect by making drawRect return immediately if called recursively. But I never did any careful study of how that affects the Apple window manager's idea of what the current list of dirty rectangles is. I wonder if that may be part of what is happening here -- that the window manager thinks it has drawn rectangles that it hasn't actually drawn because of the way that drawRect bails out when it detects recursion. Since you are now maintaining a private list of dirty rectangles maybe that could be handled in a more sophisticated way. marc_culler (claiming to be Marc Culler) added on 2020-05-18 14:08:14: Sorry - I meant "interrupt handler" where I wrote "event handler". Too bad we can't edit these comments. marc_culler (claiming to be Marc Culler) added on 2020-05-18 14:06:17: One thing to factor into this picture is the way that Apple runs drawRect. Of course none of this is documented, so whatever understanding of it we may have comes from reverse engineering. My impression (which I haven't figured out how to test) is that drawRect is run more or less like an event handler. It runs on the main thread, probably with the same stack, but it can be run at any time. Generally it won't run if there are no rects that need drawing. But if there are some dirty rects than it can fire at any time. This means that it can fire while Tcl is executing an idle task to draw a widget. And I think it is even possible for the display proc of a widget to be called while the display proc of that same widget is being executed. This would seem to require that all display procs be fully reentrant, which of course no one was expecting when they wrote the drawing code. Also, it means that you cannot rely on the order of execution of various parts of the Tcl event loop. The Tcl event loop might think it is handling events, not idle tasks, when drawRect fires and starts executing idletasks that it generated from the expose events. How all of this asynchronous re-entry plays out in real life can be quite complicated. chrstphrchvz added on 2020-05-18 13:00:19: Another example of baked-in idle redrawing event assumptions: while investigating [2a61eca3a8], I tried having `Tk_CanvasEventuallyRedraw()` do `DisplayCanvas()` immediately instead of as an idle event, and this led to the canvas borders not always being redrawn (appearing black instead). This is because in `CanvasEventProc()`, for `eventPtr->type == Expose`, the `REDRAW_BORDERS` flag is set after `Tk_CanvasEventuallyRedraw()` is called. So it is assumed that an idle redrawing event for a widget can be queued before the widget's state has been completely updated and is ready for redrawing. chrstphrchvz added on 2020-05-18 12:52:22: …it is not entirely clear to me why it would make any difference… Starting from the Tcl event loop outside of `drawRect:`: when an idle redrawing event is handled, it will cause `setNeedsDisplayInRect:` to be called, which then makes `display:` → `drawRect:` want to be called. The Aqua event(?) this triggers seems to often take precedence over handling of other idle events, including redrawing events that would also call `setNeedsDisplayInRect:` (though from your description, it sounds that it shouldn't be happening). If those idle redrawing events can instead all be handled before the next `drawRect:` call, then that will ensure `drawRect:` will be coalesced. I now notice that adding a temporary idle event loop to `TkMacOSXEventSetupProc()` runs into an issue resembling [d1989fb7cf]: an empty toplevel is shown for about 1 second between doing `package require Tk` and any subsequent statements in a script. This happens even when specifying `TCL_DONT_WAIT` (which I'm not sure I understand how to use correctly). I would think there might already be a global variable somewhere that could be used to skip this temporary loop until Tk is sufficiently "initialized". marc_culler (claiming to be Marc Culler) added on 2020-05-17 16:09:57: Well, I would not want to be the one to try adding hundreds of lines of platform-specific conditional code to the generic files. Also, I think you can safely assume that there are many places in the code where it is assumed that a drawing command will be run as an idle task and that they would reveal themselves one by one as bugs. My initial reaction to your question is yes, I do think it would be safe to process idle tasks in the setup proc. But it does change Tcl's event processing strategy slightly and it is not entirely clear to me why it would make any difference. My (admittedly imperfect) understanding of Tcl's strategy is that the call to setup_proc is made when every existing event that can possibly be handled has been handled, and Tcl is checking whether any of its event sources might have something new for it to do before it goes to sleep. If that is correct it would mean that all pending idle events already get processed right before the call to the setup proc. chrstphrchvz added on 2020-05-17 14:21:47: Marc, I'm not sure I understand what you believe is entailed by my suggestion to call drawing routines directly; the suggestion should resemble what I offered in my comment dated 2020-03-25 00:07:45, i.e. adding #ifndef MAC_OSX_TK /* existing code calling Tcl_DoWhenIdle() / #else / redraw immediately / #endif around a few existing lines at a time (admittedly in generic code where it has already been said to avoid making things platform-specific). I'm not clear what the big task could be other than potentially applying this approach to a few hundred places in Tk. The intention of this change was to workaround remaining issues noticeable at least with the listbox and scrollbar example, where the scrollbar would either not redraw half of the time, or more recently would not always redraw strictly simultaneously with the listbox (causing "flickering"). These were issues that if left unaddressed I thought could potentially be considered deal-breaking regressions for any approaches addressing the overall issue of widgets not always redrawing. Admittedly it is not a well-thought-out change, and it's more clear now that such non-idle event usage of drawing routines can lead to problems. However it might not even be need to be considered if some other way to encourage coalescing of `drawRect:` and prevent flickering is found. Currently, I'd to know if it's safe to do `while (Tcl_DoOneEvent(TCL_IDLE_EVENTS)) {}` in `TkMacOSXEventsSetupProc()`, just before the call to `[NSApp nextEventMatchingMask:…]`. From looking at stack traces while in `drawRect:`, this appears to be the latest point prior to entering any `drawRect:` call that is still under Tk's control. From briefly trying it out, I haven't noticed anything wrong, yet it greatly reduces (but not eliminates) the likelihood of an already-queued idle event when entering `drawRect:`, and encourages `drawRect:` coalescing (preventing flickering). marc_culler (claiming to be Marc Culler) added on 2020-05-15 17:14:15: Calling drawing routines directly would be a huge redesign of Tk. It might be a big improvement but it would probably also be a big job. With the current design (and especially the macOS adaptation) I would be tempted to say that calling drawing routines directly should be strictly avoided. chrstphrchvz added on 2020-05-15 16:34:13: As I've learned from investigating [40ada90762], my suggestion to call widget drawing routines directly instead of as idle events likely needs to be accompanied by usage of `Tcl_CancelIdleCall()`. Apparently these drawing routines are only intended to be called as idle events, such that whatever queues them is responsible for setting the widget's `REDRAW_PENDING` and checking it to only ever queue one redrawing event per widget at a time, and that the drawing routine clears the flag when handled. Anything that strays from this and calls drawing routines directly must then also cancel any idle calls for the routine, otherwise if the widget is destroyed, the destroy callback may assume the cleared `REDRAW_PENDING` flag means it can skip canceling any queued idle redrawing and free the widget data, potentially leading to a crash when that idle redrawing is finally attempted. chrstphrchvz added on 2020-05-11 11:49:53: There is an issue in the approach I used to postpone `setNeedsDisplayInRect:` calls, where a segmentation fault occurs if the call outlives the view it was for. I'll see if I can revise it. chrstphrchvz added on 2020-05-10 00:30:05: It turns out [2a61eca3a8] is somewhat related to this issue; it is another form of widgets erroneously being told it is OK to redraw when it is not. It appears to be caused by having incorrectly assumed that if a widget's bounds overlap with a rectangle from `getRectsBeingDrawn:count` (i.e. `needsToDrawRect:` returns `YES` for the widget's bounds) then any drawing the widget does will appear onscreen and the widget's appearance will then be up-to-date. But the rectangle(s) returned by `getRectsBeingDrawn:count` may contain portions that are not drawable; Expose events and idle redrawing events for widgets in the undrawable area will be handled, but the drawing those widgets do can be discarded, and widgets will be unaware they haven't completely redrawn. chrstphrchvz added on 2020-05-06 13:08:18: The issue in Nicolas' script is interesting, but I want to say is a different issue, so I'm opening a new ticket for it: [2a61eca3a8]. I'm not certain the workarounds I've discussed here actually prevent it (if so, unintentionally); but they do appear to help study the issue by eliminating possible causes. In short, the canvases do appear to be getting redrawn (rather than their redrawing getting delayed indefinitely), but with what I have to guess is the wrong background color, despite the canvas background appearing to be properly configured when redrawn. Still, I have noticed a few other delayed drawing quirks that the workarounds I've provided here prevent, such as stale drawing artifacts due to [efbedd5ff5]. Any widget—not just scrollbars—can currently try to draw when it shouldn't (i.e. outside `needsToDrawRect:`) or have its redrawing delayed until something else is also drawn or the mouse is clicked; all that seems special about scrollbars is they are used and redrawn with the exact sequence of events needed to reliably exhibit this issue. nab added on 2020-05-05 16:35:50: Hi, here's a script that describe the problem of not every event to being processed: package require Tk package require Ttk ttk::button .b -text push grid .b -sticky news for {set x 1} {$x<100} {incr x} { canvas .c$x -width 1 -height 1 -bg black grid .c$x -sticky news } grid columnconfigure . all -weight 1 grid rowconfigure . 1 -weight 1 bind .b <ButtonPress-1> {doColorchange red} bind .b <ButtonRelease-1> {doColorchange yellow} proc doColorchange {colour} { for {set x 1} {$x<100} {incr x} { after $x [list .c$x configure -bg $colour] } } #eof if you <ButtonPress-1> on the push button, each canvas goes red but when you <ButtonRelease-1>, not all canvas goes to yellow (at least not all of them). with the patch of Christopher all canvas always goes to the right colours, but with this patch CPU usage of my app +10~20% please note that not all events are processed only when <ButtonRelease-1> <ButtonPress-1> always change canvas colour. ++ Nicolas chrstphrchvz added on 2020-05-03 05:16:08: Continuing to investigate why `drawRect:` isn't happening after doing `setNeedsDisplayInRect:`: the problem is not that `setNeedsDisplayInRect:` is sometimes completely ignored and fails to ever redraw the rectangle given to it; the problem is that Tk is waiting until something else needs to be drawn, at which point `drawRect:` is called for a region which the dirty rectangle from the earlier `setNeedsDisplayInRect:` is coalesced with. I think this may point to an issue in the Aqua event source for detecting when there's outstanding `drawRect:` calls to make before handing control back to the Tcl event loop, but I am not knowledgeable on this area. Here's a more complete explanation for why the listbox example only redraws the scrollbar every other time it is scrolled (using `setNeedsDisplayInRect:` when `needsToDrawRect:` is false): Start outside of `drawRect:`. The listbox is then scrolled the first time, so it needs to redraw. Since this happens outside of `drawRect:`, `setNeedsDisplayInRect:` is done, and only for the rectangle containing the listbox. `DrawWidget()` for the scrollbar is scheduled to do when idle. `drawRect:` is then called for the rectangle containing the listbox. The `DrawWidget()` for the scrollbar is now done, but since the scrollbar is outside of `needsToDrawRect:`, `setNeedsDisplayInRect:` is done for the rectangle containing it, and it is not yet redrawn. The listbox itself is then redrawn, and `drawRect:` finishes. Nothing else happens automatically, so the scrollbar has yet to redraw. The listbox is then scrolled a second time, and since this is again outside of `drawRect:`, `setNeedsDisplayInRect:` is done for the listbox. `drawRect:` is then entered again, but this time for both the rectangle containing the listbox and the scrollbar; both are successfully redrawn, and no more `drawRect:` calls are needed or requested. Scrolling the listbox again starts this process over. Another way to observe this is to scroll the listbox, and when the scrollbar does not redraw, move the mouse over some other widget which redraws when hovered; `drawRect:` will then be called for a region containing both the hovered widget and the scrollbar. Using `setNeedsDisplay:YES` (equivalent to `setNeedsDisplayInRect:bounds`) instead of `setNeedsDisplayInRect:dirtyNS` masks the problem, because it makes sure that `drawRect:` is called for the entire view bounds; then when something else discovers it needs to redraw itself while still in `drawRect:`, it can do so right away. (Although using `setNeedsDisplay:YES` means everything* in the view is redrawn anyway, because everything in it will receive an Expose event.) So to correct what I said 2020-03-22, `setNeedsDisplay:YES` might not have any ability (that `setNeedsDisplayInRect:` lacks) to call `drawRect:` again before dropping back to the Tcl event loop; rather, `setNeedsDisplay:YES` just eliminates the need to do so. I came up with a way of using `Tcl_CreateTimerHandler(0, …)` to postpone any `setNeedsDisplayInRect:` usage until no longer inside `drawRect:`. I found that while this approach ensures `drawRect:` will at least be called soon after using `setNeedsDisplayInRect:`, it can take a few frames to do so. For the listbox example, the scrollbar is usually redrawn 0 to 3 frames after the listbox redraws. (The previously mentioned workaround of having `TtkRedisplayWidget()` do `DrawWidget()` immediately rather than schedule it as an idle handler also applies here, ensuring the scrollbar and listbox are redrawn simultaneously.) While this approach might not be desirable to actually use, it may be helpful for studying the problem, so I am providing a patch for it. chrstphrchvz added on 2020-03-25 22:43:45: I wonder if might be poor form to use `setNeedsDisplayInRect:` at all while inside `drawRect:`. https://developer.apple.com/videos/play/wwdc2013/215/ (4:38, slide 19) says to avoid calling `setNeedsDisplayInRect:` from `drawRect`, but only because it's "bad for performance"; it doesn't say that doing so anyways is somehow erroneous (at least in 2013). They also advise not doing layout inside `drawReect:`. There are already widgets with separate layout and drawing routines, but they currently always execute together (i.e. while in `drawRect:` on Aqua). chrstphrchvz added on 2020-03-25 00:07:45: I've noticed another problem, though: trying to rely on `setNeedsDisplayInRect:` does not guarantee `drawRect:` will be called again before handing control back over to the Tcl event loop, unlike `setNeedsDisplay:YES`. (The effect I observe is that only every other MouseWheel event fails to update the scrollbar when scrolling a canvas or listbox.) I haven't found why this is the case… I still don't know why this is the case, but I wonder if might be poor form to use `setNeedsDisplayInRect:` at all while inside `drawRect:`. A workaround I've found is for `TtkRedisplayWidget()` to call `DrawWidget()` right away rather than schedule it as an idle event, at least on Aqua: #ifndef MAC_OSX_TK if (!(corePtr->flags & REDISPLAY_PENDING)) { Tcl_DoWhenIdle(DrawWidget, corePtr); corePtr->flags \|= REDISPLAY_PENDING; } #else DrawWidget(corePtr); #endif Then, as long as `TtkRedisplayWidget()` is called while outside of `drawRect:`, `setNeedsDisplayInRect:` will be have to be called, and `DrawWidget()` is hopefully called again later only while in a `drawRect:` for the area containing the widget (when handling the generated Expose event). This might be inefficient, though, depending on how much wasted work is done before reaching `setNeedsDisplayInRect:`, and how many times `DrawWidget()` gets called outside of `drawRect:`. But the same sort of inefficiency occurs when routines try to draw from the wrong `drawRect:`. Another inefficiency comes from having to find `DrawWidget()` all over again by using the area being redrawn to find the widget being redrawn, as opposed to directly calling the routine from a queued event. I'm not sure whether it's suddenly more inefficient than using `setNeedsDisplay:YES`. It could be less inefficient if the widget's `REDISPLAY_PENDING` flag were set when a widget is unable to draw right away, and cleared only after a widget is actually drawn. But it doesn't look like the widget drawing routines (e.g. `ThumbElementDraw()`) have access to this flag; the functions involved don't have return values nor report whether they successfully drew the widget. It seems a more efficient approach for drawing widgets on Aqua would be to separate the routines for drawing from routines for requesting widgets be redrawn, but in a way that is `drawRect`-aware (i.e. requesting widgets be redrawn uses `setNeedsDisplayInRect:`). Or, widget drawing routines should check whether they can draw as soon as possible to minimize wasted work, which might be an easier change to implement. Another approach might be for `generateExposeEvents` to handle only the idle events created as a result of the Expose events, rather than all idle events (including those for drawing outside of the area of the current `drawRect:`). However it's not clear to me this can be easily done nor would it actually work. nab added on 2020-03-23 19:53:35: Hi Christopher, maybe I'm wrong but from all the tests I've made, it seems that [view setNeedsDisplayInRect:dirtyNS] is better than [view setNeedsDisplay:YES] as the later trigger all displayed widget to be redrawn. ++ chrstphrchvz added on 2020-03-23 18:21:07: I now notice how this topic of CPU usage spiking has gone on for many months and multiple tickets without much satisfaction… I've not done any real benchmarking, but I am skeptical that there is a noticeable difference from `CGRectApplyAffineTransform()`. This is at worst a small matrix multiplication, but all it does here is equivalent to having only done: dirtyCG = NSRectToCGRect(dirtyNS); dirtyCG.origin.y = dirtyNS.size.height - dirtyCG.origin.y; (Why use `CGRectApplyAffineTransform()` instead? As a matter of style, I guess; maybe it's a better way of saying "go from Tk coordinates (origin at top left corner) to Cocoa coordinates (origin at bottom left corner)" in one step. Tk could also instead override `-isFlipped` to return `YES` for `TKContentView`, and then not have to convert between Tk coordinates and Cocoa coordinates everywhere.) `needsToDrawRect:` might be the more expensive thing I added, depending how many rectangles are being drawn at a time; it's more work than checking whether the mouse is clicked or scrolling. But its use is actually recommended for performance in some cases. Maybe there are other places in the Tk code where it can be used at least to simplify the code and/or ensure its correctness. nab added on 2020-03-23 06:10:42: Hi Christopher, I've tested your patch against my app and CPU is rising again > 100%. at least it was better for me when the CGRect dirtyCG calculation was after the if (![NSApp isDrawing] \|\| view != [NSView focusView]) clause. best regards, Nicolas chrstphrchvz added on 2020-03-22 09:35:46: A little more elaboration: I observed that the `drawRect:` call that tried drawing the scrollbar but failed always had the area containing only the listbox. This `drawRect:` call was not what queued the scrollbar to be redrawn (i.e. by `generateExposeEvents:`); it was already queued before `drawRect:` was called, by the `ScrollbarSetCommand()` being called by the listbox widget. So my guess (I have not successfully confirmed) as to why the scrollbar sometimes redrew correctly is that a different `drawRect:` call with the area of the scrollbar managed to happen before a `drawRect:` that didn't. chrstphrchvz added on 2020-03-22 08:43:44: I've tried figuring out what's actually causing this issue so that there might not have to be any more resorting to `setNeedsDisplay:YES`. The issue appears to be that `TkMacOSXSetupDrawingContext()` is erroneously returning `true` (i.e. drawing can proceed) when the region being drawn is outside of the area for the current `drawRect:` call. I couldn't tell whether `TkMacOSXSetupDrawingContext()` is already passed knowledge of the `drawRect` area, but it turns out this is what the convenience method `needsToDrawRect:` is for. The attached patch reorganizes some existing code and makes use of it. I've noticed another problem, though: trying to rely on `setNeedsDisplayInRect:` does not guarantee `drawRect:` will be called again before handing control back over to the Tcl event loop, unlike `setNeedsDisplay:YES`. (The effect I observe is that only every other MouseWheel event fails to update the scrollbar when scrolling a canvas or listbox.) I haven't found why this is the case; maybe someone else knows. But for now `needsToDrawRect:` might at least a better way of knowing when to resort to `setNeedsDisplay:YES`. kevin_walzer added on 2020-02-25 03:31:25: Marc, your bug fix branch addresses this issue. I was tinkering with something like this in tkMacOSXDraw.c but couldn't filter the events properly. Your approach works well and is very clean. Please commit at your convenience and thank you. marc_culler (claiming to be Marc Culler) added on 2020-02-25 02:37:52: I created a bugfix branch which corrects the symptom. It simply marks the window as needing display whanever a mousewheel event arrives. I know this is not optimal, because I can see the scrollbar being drawn while the scrollbar is outside of the clipping region. But I don't know what the optimal fix would be. marc_culler (claiming to be Marc Culler) added on 2020-02-24 19:39:27: My testing indicates that the scrollbar thumb is being drawn when the mousewheel is adjusted slowly. However, it is being drawn with a clipping region set to the bounding box of the listview, so there is no change to the appearance. I do not know why the clipping region is wrong at this point. kevin_walzer added on 2020-02-24 12:44:46: Removing the code from those commits in tkMacOSXDraw.c does not change the observed behavior in this bug report - those changes must have addressed some other issue. Marc, if you have any thoughts about this, they are welcome. kevin_walzer added on 2020-02-24 12:15:57: I suspect that this bug was introduced with commits 8448b35c and 24783747 -- some adjustments on redrawing the scrollbar in certain cases. Marc, can you take a look? This appears in the canvas demo and the listbox demo - but not the text widget demo, as far as I can tell. fvogel added on 2020-02-24 07:41:40: I'm not seeing this issue on Windows. bll added on 2020-02-24 03:48:29: Does not appear as a problem on X11. Reproducable on Mac OS 10.15.3.

Attachments:

abc.tcl [download] added by marc_culler on 2020-06-03 13:02:31. [details]
cc_canvas.tcl [download] added by marc_culler on 2020-06-03 13:01:45. [details]
nab_canvas.tcl [download] added by marc_culler on 2020-06-03 13:00:15. [details]
idle_curiosity_2.diff [download] added by chrstphrchvz on 2020-05-24 16:02:56. [details]
06d8246baf-postponedSNDIR.diff [download] added by chrstphrchvz on 2020-05-03 05:21:14. [details]
needsToDrawRect-patch.diff [download] added by chrstphrchvz on 2020-03-22 08:46:04. [details]