Tcl Source Code

View Ticket
Login
2024-11-14
21:00 Ticket [3968086595] Basic nonblocking write-only refchan hangs status still Closed with 6 other changes artifact: 2fd6bb79d8 user: pooryorick
2024-11-12
20:21 Closed ticket [3968086595]. artifact: b48fddb291 user: jan.nijtmans
2024-08-15
11:08 Ticket [3968086595]: 3 changes artifact: 8b3f2f63cd user: pooryorick
2024-08-14
11:39 Ticket [3968086595]: 3 changes artifact: 2b825d680b user: apnadkarni
11:32 Ticket [3968086595]: 3 changes artifact: 91081cf2ba user: apnadkarni
11:18 Ticket [3968086595]: 3 changes artifact: ca7b689d1e user: pooryorick
11:14 Ticket [3968086595]: 3 changes artifact: b04022b2fc user: pooryorick
10:50 Ticket [3968086595]: 3 changes artifact: 7541238afe user: apnadkarni
07:52 Ticket [3968086595]: 3 changes artifact: 78ee3360cc user: jan.nijtmans
2024-08-13
17:47 Ticket [3968086595]: 3 changes artifact: 880be1cd5b user: pooryorick
16:29 Ticket [3968086595]: 3 changes artifact: a08237b583 user: jan.nijtmans
15:51 Ticket [3968086595]: 3 changes artifact: 494db707f5 user: pooryorick
15:50 Ticket [3968086595]: 3 changes artifact: b3e44909b7 user: pooryorick
15:39 Open ticket [3968086595]. artifact: 29a0113491 user: apnadkarni
15:35 Ticket [3968086595]: 3 changes artifact: 676054efed user: apnadkarni
15:25 Ticket [3968086595]: 3 changes artifact: cd8d19b5e0 user: jan.nijtmans
14:55 Ticket [3968086595]: 3 changes artifact: 6d3ed0bf04 user: jan.nijtmans
14:18 Ticket [3968086595]: 3 changes artifact: 554efca484 user: pooryorick
14:03 Ticket [3968086595]: 3 changes artifact: adef465221 user: jan.nijtmans
13:47 Pending ticket [3968086595]. artifact: 08f15f7f32 user: pooryorick
13:45
Fix for [39680865953cce4f], Basic nonblocking write-only refchan hangs. Closed-Leaf check-in: ab87ef464e user: pooryorick tags: bug-3968086595
11:27 New ticket [3968086595] Basic nonblocking write-only refchan hangs. artifact: 813777d4df user: pooryorick

Ticket UUID: 39680865953cce4fa7187d3613ff7707302d31a6
Title: Basic nonblocking write-only refchan hangs
Type: Bug Version:
Submitter: pooryorick Created on: 2024-08-13 11:27:35
Subsystem: - New Builtin Commands Assigned To: nobody
Priority: 5 Medium Severity: Minor
Status: Closed Last Modified: 2024-11-14 21:00:48
Resolution: Accepted Closed By: pooryorick
    Closed on: 2024-11-14 21:00:48
Description:

The following script should progressively report that it is writing to the channel, and eventually complete. Instead it hangs after the first write to the channel:

#! /usr/bin/env tclsh

namespace eval rc {
	namespace export *
	namespace ensemble create

	proc initialize {chan mode} {
		namespace eval chan_$chan {
			variable buffer {}
			variable watch {}
			variable writetask {}
		}
		return {initialize finalize watch write}
	}


	proc finalize chan {
		namespace delete chan_$chan
	}


	proc watch {chan spec} {
		namespace upvar chan_$chan watch watch writetask writetask
		set watch $spec
		after cancel $writetask
		if {{write} in $spec} {
			set writetask [after 0 [list after idle [list ::chan postevent $chan write]]]
		}
		return
	}


	proc write {chan data} {
		namespace upvar chan_$chan buffer buffer watch watch \
			writetask writetask
		append buffer $data
		after cancel writetask
		if {{write} in $watch} {
			set writetask [after 0 [
				list after idle [list ::chan postevent $chan write]]]
		}
		return [string length $data]
	}
}

set chan [chan create write [namespace which rc]]
chan configure $chan -blocking 0
coroutine c1 apply [list chan {
	after 0 [list [info coroutine]] 
	yield
	chan event $chan writable [list [info coroutine]]
	while {[incr i] < 1000000}  {
		yield
		puts [list writing to $chan $i]
		puts $chan $i
	}
	set [namespace current]::done 1
} [namespace current]] $chan
vwait [namespace current]::done

User Comments: pooryorick added on 2024-11-14 21:00:48:

This ticket is separate from that techinical note, and illustrates a real reproducible problem. Therefore it should remain open until fixed.


jan.nijtmans added on 2024-11-12 20:21:05:

Since the TCT, in the online meeting from 11-11-2024, decided to close all tickets related to this Technical Noted, and also decided to no longer allow relicensed code in Tcl official repositories (See: [560b27abdac4abe3]), this ticket can be closed.

Thanks, Ashok, for providing a working versions of Nathan's non-working version, and pointing out the reason.


pooryorick added on 2024-08-15 11:08:58:

This fix isn't about anything which would require a TIP. It's only about making sure an event posted by a refchan doesn't get swallowed.


apnadkarni added on 2024-08-14 11:39:38:

Nathan wrote: Implementing a continuously-self-scheduling timer at the script level is a bad hack to overcome a bad implementation. This should be fixed.

By all means, TIP and implement improvements.


apnadkarni added on 2024-08-14 11:32:47:
Nope. See the delay at the top which allows control of how often the event is scheduled. It's 0 because your code had it as 0. In general, if a channel is capable of continuously sinking data, it is up to the channel to decide how often to generate events. In your original C implementation, it was continuously generating events (a) IRRESPECTIVE of channel state without giving the refchan control of (b) whether it needed to or not and (c) without giving a chance for idle handlers to run (the last of which you *may* have fixed).

Now, perhaps your latest code is doing something in addition. I dunno. As I said, I don't have time to review for 9.0, and think it is risky at this stage to include without a review. The crashes in CI are proof as to why, whether you fix them or not. If someone else can review and finds it worthy, fine. Else 9.0.1.

And your mention of the time you have spent correcting missteps is exactly why significant changes need to be reviewed before commits!

pooryorick added on 2024-08-14 11:18:25:

To summarize: Implementing a continuously-self-scheduling timer at the script level is a bad hack to overcome a bad implementation. This should be fixed.


pooryorick added on 2024-08-14 11:14:40:

Ashok, the "working" version of the implementation that you just proposed is exactly the strategy you are advocating against: Continuous event posting on a timer that just keeps rescheduling itself. In contrast, the fix that I produced at the C level does not do this. Instead, it just takes care to propagate the signal that is sent from the refchan by not letting it get swallowed just because an internal buffer got in the way. The issue that is illustrated in this ticket is that the C implementation currently swallows that signal. It's entirely possible to fix this at the C level so that postevent signals sent by a refchan do not get swallowed.

I could also complain about "orders of magnitude" of time that I've spent over the years correcting missteps that have been made in Tcl (ahem, the alphabet for strings for Tcl 9, or the numerous bugs that I make sure to diagnose and fix before even reporting them so that others don't have to wast their time on them), but I'm not going to engage in such bellyaching.


apnadkarni added on 2024-08-14 10:50:26:

Nathan, a working version of your channel implementation is below but first...

Nathan wrote: This fix is not the same as those that were previously argued, and the example that demonstrates the issue is also not the same. The previous arguments were about 1) whether an asynchronous channel should immediately signal writable when a channel is opened, and 2) whether postevent is asynchronous.

Not so. While 2) was a matter of requiring a TIP for incompatible changes to a public API where it could be discussed, 1) was only a symptom of a larger issue - continuous generation of "write" events without knowledge of actual channel state, something which completely defeats the purpose of event-based I/O turning it into a disguised polling model. The commit under discussion raises the same fears pending a thorough review, leave aside the question of whether those changes are even needed.

Nathan wrote: The core team should take care not to become a bottleneck to needed contributions.

While there will be some delay, I am afraid the time wasted this year chasing down and reverting or fixing commits that made it in without review is an order of magnitude higher.

Getting to your sample script, below is a working version of a functionally similar refchan. The implementation is based on Aku's original paper. Note the refchan implementation may differ but I believe the functionality and user test code is the same as your refchan. Feel free to check that but my guess is that your refchan implementation has a bug or is incomplete as is the case in the two tickets you insist on reopening.

namespace eval rc {
    namespace ensemble create
    namespace export *
    # Change to taste depending on how much CPU you want to hog
    variable delay 0

    proc finalize {chan args} {
        namespace upvar c_$chan timer timer buffer buffer
        catch {after cancel $timer}
        puts "Finalize: Have [string length $buffer] bytes."
        namespace delete c_$chan
    }

    proc initialize {chan args} {
        namespace eval c_$chan {}
        namespace upvar c_$chan watching watching timer timer buffer buffer
        set watching {}
        set buffer {}
        return {finalize initialize watch write}
    }

    proc watch {chan spec} {
        namespace upvar c_$chan watching watching
        set watching $spec
        update $chan
    }

    proc write {chan data} {
        namespace upvar c_$chan buffer buffer
        append buffer $data
        return [string length $data]
    }

    proc update {chan} {
        namespace upvar c_$chan watching watching timer timer
        variable delay
        catch {after cancel $timer}
        if {"write" in $watching} {
            set timer [after idle after $delay \
                           [namespace code [list post $chan]]]
        }
    }

    proc post {chan} {
        variable delay
        namespace upvar c_$chan watching watching timer timer
        if {"write" in $watching} {
            set timer [after idle after $delay \
                           [namespace code [list post $chan]]]
            chan postevent $chan write
        }
    }
}

set chan [chan create write [namespace which rc]]
chan configure $chan -blocking 0
coroutine c1 apply [list chan {
    after 0 [list [info coroutine]]
    yield
    chan event $chan writable [list [info coroutine]]
    while {[incr i] < 1000000}  {
        yield
        puts [list writing to $chan $i]
        puts $chan $i
    }
    set [namespace current]::done 1
} [namespace current]] $chan
vwait [namespace current]::done
close $chan

For explanation of above, please see Aku's paper.


jan.nijtmans added on 2024-08-14 07:52:59:

You are right, it's not a "risk", it's an actual failure:

https://github.com/tcltk/tcl/actions/runs/10383146736/job/28747514350

chanio.test
Test file error: child killed: segmentation violation
...
io.test
Test file error: child killed: segmentation violation

(I took the freedom to start a CI build of your branch)

For all other questions you might want to ask, you can find the answer here (there are some additions by Andreas Kupries at the end of this document)


pooryorick added on 2024-08-13 17:47:50:

Jan, can you provide some details on what you think the risk is?


jan.nijtmans added on 2024-08-13 16:29:49:

In my judgement, the risk putting this into 9.0 is higher than the benefit. I'll bring this up next TTT meeting, I will report the decision here.


pooryorick added on 2024-08-13 15:51:41:

The core team should take care not to become a bottleneck to needed contributions.


pooryorick added on 2024-08-13 15:50:38:

This fix is not the same as those that were previously argued, and the example that demonstrates the issue is also not the same. The previous arguments were about 1) whether an asynchronous channel should immediately signal writable when a channel is opened, and 2) whether postevent is asynchronous. This demo does not intersect with those concerns, and this fix touches on neither of those issues. It only resolves the issue reported in this ticket. If someone doesn't raise a specific objection to this particular fix, it should be merged into trunk.


apnadkarni added on 2024-08-13 15:39:42:
Just so we do not go back and forth on this, I am not going to argue whether this is a bug or the fix is appropriate if so, and so on. I'm simply stating my opinion that this is to be revisited post 9.0 release. Also marking the bug open accordingly so it does not get lost.

apnadkarni added on 2024-08-13 15:35:13:

I only had a quick look as there are other higher priorities. At first glance, this is on the same lines as the proposed "fix" in earlier tickets that was reverted and my original objections to a polling model pretending to be event-driven are still likely to stand pending a closer look. Also, given the earlier "bug" reports that I still deem to be user programming errors, the test case in the report also needs to be reviewed for correctness. Again, something I can't spend time on until 9.0 release.

Given where 9.0 is in the release cycle, even if this was a genuine bug, it is not important enough to look into for 9.0 relative to other issues so my opinion is to punt even investigating it till 9.0.1.

@pooryorick, regarding "since when every change needs TCT review", I don't think that is being done. It so happens your change is essentially the same, or very close to, your earlier commits that were reverted for reasons discussed ad nauseum. When it was collectively agreed amongst 7-8 folks (with you being the sole exception) to revert those changes, it was only right of Jan to put them on a branch pending a review, whenever that might be.


jan.nijtmans added on 2024-08-13 15:25:55:

What "Version"? Do 8.6 (and 8.7) have the same problem too?


jan.nijtmans added on 2024-08-13 14:55:22:

@ashok, @sergey, your comments please?


pooryorick added on 2024-08-13 14:18:13:

When was it decided that TCT members review each change? That was never the case before.


jan.nijtmans added on 2024-08-13 14:03:32:

Moved to bug-3968086595 branch, so it can be properly reviewed (by a TCT member), before being merged to "trunk" and "core-8-branch".


pooryorick added on 2024-08-13 13:47:40:

Fixed in [ab87ef464e12c190].