Tcl Source Code

View Ticket
Login
Ticket UUID: ac7592e73c10d04b9bd0303308ac3db2396ad6e9
Title: idle events are never processed when a "writable" handler on a nonblocking channel dominates the event queue
Type: Bug Version:
Submitter: pooryorick Created on: 2024-04-01 00:20:08
Subsystem: - New Builtin Commands Assigned To: pooryorick
Priority: 5 Medium Severity: Minor
Status: Closed Last Modified: 2024-11-21 13:26:27
Resolution: Invalid Closed By: jan.nijtmans
    Closed on: 2024-11-21 13:26:27
Description:

The following script derived from an example provided in [de232b49f26da1c1], never completes.

    namespace eval refchan {
	namespace ensemble create
	namespace export *

	proc finalize {chan args} {
	    namespace delete c_$chan
	}

	proc initialize {chan args} {
	    namespace eval c_$chan {}
	    namespace upvar c_$chan watching watching
	    set watching {}
	    list finalize initialize seek watch write
	}

	proc watch {chan args} {
	    namespace upvar c_$chan watching watching
	    foreach arg $args {
		switch $arg {
		    write {
			if {$arg ni $watching} {
			    lappend watching $arg
			}
			after idle after 0 chan postevent $chan $arg
		    }
		}
	    }
	}

	proc write {chan args} {
	    incr ::counter
	    after idle after 0 chan postevent $chan write
	    return 1
	}
    }

    set f [chan create w [namespace which refchan]]
    chan configure $f -blocking 0
    chan event $f writable [namespace code {
	puts $f X
    }]

    variable x {}

    apply [list {} {
	set script {
	    set lambda [list ::apply [list script {
		variable count
		variable x
		if {[incr count] < 1000} {
		    try $script
		} else {
		    set x done
		}
	    } [namespace current]] $script]
	    after idle [list after 0 $lambda]
	}
	try $script
    } [namespace current]]


    set token [after 10000 [namespace code {
	set x timeout
    }]]
    vwait [namespace which -variable x]
    return $x

This is because TclIO.c/ChannelTimerProc() keeps scheduling adding a an event to the front of the standard queue.

User Comments: jan.nijtmans added on 2024-11-21 13:26:27:

Related to [de232b49f2] (See comment below 2024-04-01 06:11:07), so can be closed.


pooryorick added on 2024-04-10 16:17:44:

For the purpose of this discussion, it doesn't make any difference whether it's

after 0 [list after idle]

or

after idle [list after 0]

. The question is whether or not to get the idle queue involved in handling events for channels, and sebres' answer was "no".


apnadkarni added on 2024-04-08 02:33:04:

Nathan wrote: I an IO event should never be considered as an idle event, then Ashok's idea of requiring that refchans user [after 0 [list after idle]] is a bad idea too. But with the way that the idle works in Tcl, I'm not sure that sebres' criticism is valid.

I do not recall suggesting "after 0 after idle" and cannot find where I wrote that. I thought I wrote "after idle after 0". If I did state the former, it was in error and should have been the latter.

If processing events off the event queue, and desiring to reschedule again off the event queue without starving the idle queue, my thinking is that "after idle after 0" is the correct idiom. Conversely, "after 0 after idle" if running on the idle queue and rescheduling back on the idle queue without starving the event queue.


sebres added on 2024-04-02 14:02:48:

This issue is even a typical example where idle events will be and can be disadvantaged (if you have many "normal" events, idle event is never happen and it is good so... well, it was good till [0feb3d12b6]).

But take a look at Tcl_DoOneEvent (and flag TCL_IDLE_EVENTS). An event cycle of typical reactor (N threads x M interpreters) can and mostly will invoke Tcl_DoOneEvent without TCL_IDLE_EVENTS in main cycle and would invoke it with TCL_IDLE_EVENTS in some helper depending on load (e. g. periodically).

Anyway it is not idle and my critic is valid for many reasons, no matter how, you think, the idle may work in Tcl (because often it doesn't work in this way).


pooryorick added on 2024-04-02 13:39:36:

I an IO event should never be considered as an idle event, then Ashok's idea of requiring that refchans user [after 0 [list after idle]] is a bad idea too. But with the way that the idle works in Tcl, I'm not sure that sebres' criticism is valid.


sebres added on 2024-04-02 12:49:02:

I don't think [0feb3d12b6] is a good approach: replacement synthetic timer with an idle event is still more worse - a writable event is an IO-event and so shall be never considered as an idle event. It is very bad for all systems, e. g. with own event-cycle, that may retard idle events, so executing them only in idle state of thread.


jan.nijtmans added on 2024-04-02 12:10:23:

> Regarding that error, I had already fixed that in a later commit that I also merged to trunk, and you reverted it along with the earlier commits.

I saw that later commit, it doesn't fix the problem!

> and that I'm looking into those errors now. Please don't bother. I have the same concerns as Ashok. Your commits on trunk hinder dgp's work on Tcl 9.0b2.


pooryorick added on 2024-04-02 11:57:33:

Jan, thanks for the TIP! I had just cloned the Tcl repository, and had noticed that too. Perhaps another prefix like "ci-" could also be reserved for that purpose.

Regarding that error, I had already fixed that in a later commit that I also merged to trunk, and you reverted it along with the earlier commits. However, I hadn't fixed the other error that is showing up only on Windows and Mac, so that's ok, and that I'm looking into those errors now.


jan.nijtmans added on 2024-04-02 11:37:48:

> This branch only fails on Windows and Mac systems, and I don't have such systems to test changes on. How can I get the Github CI system to test this branch so that I can get feedback and fix the issue?

Sometimes, it fails on Linux too: Proof:

https://github.com/tcltk/tcl/actions/runs/8518905746/job/23331906451

You can build any branch on GITHUB, by adding a tag "core-*" to the commit you want to build (in fossil). That's how I managed to build your "bug-ac7592e73c10d04b" branch there, by adding "core-bug-ac7592e73c10d04b" ;-)

(don't forget to remove it the next day...)


apnadkarni added on 2024-04-02 03:22:41:
Nathan,

*How can I get the Github CI system to test this branch so that I can get feedback and fix the issue?*

What I have done in the past is to clone the repository (on github) and hack the github action workflow settings.

I'm not sure there is an easier way. Testing all branches has proved problematic in the past because of the number of jobs generated (or something)

apnadkarni added on 2024-04-02 03:19:39:
I wrote my concerns with this change in [934bb3a69d].

pooryorick added on 2024-04-01 22:58:18:

This branch only fails on Windows and Mac systems, and I don't have such systems to test changes on. How can I get the Github CI system to test this branch so that I can get feedback and fix the issue?


jan.nijtmans added on 2024-04-01 19:50:26:

Running testsuite (MacOS, --enable-symbols=mem), I see:

io.test
Test file error: can not find reflected channel named "rc6"
    while executing
"chan postevent rc6 write"
    ("after" script)
can not find reflected channel named "rc6"
    while executing
"chan postevent rc6 write"
    ("after" script)
ioCmd.test

This means, that this 'fix' blocks dgp's work in trying to release 9.0b2. I see no other way than moving it to a branch.

Also, I already merged the implementation of TIP #688 to trunk, only after that I saw this problem. So, it means extra work for me too. Thanks! (not)


pooryorick added on 2024-04-01 07:49:23:

This fix doesn't stop us from continuing in any direction we decide on later. ChannelTimerProc() and the timer itself can be removed if there's a good alternative. Fixing this issue in the meantime just gets it out of the way so it doesn't muddy the arguments.


apnadkarni added on 2024-04-01 06:11:07:
I'll continue in ticket [de232b49f2] later this evening when I have time.

apnadkarni added on 2024-04-01 06:09:18:

I'm unhappy about your making changes to the trunk on a point of dispute before there is some level of consensus on the fix. It makes reverting potentially painful for everyone else. Please move it to a bugfix branch.


pooryorick added on 2024-04-01 00:26:13:

Fixed in [b2d083e9e04e492d] by having tclIO.c/ChannelTimerProc() us the idle queue instead of the standard queue to reschedule itself.