Tcl Source Code: Ticket Change Details

Overview
Artifact ID:	b11ef71db56e7fc8312483930d6986fb4b827f7783fd366c55ef42acd62f72d7
Ticket:	de232b49f26da1c18e07513d4c7caa203cd27910 write-only nonblocking refchan and Tcl internal buffers
User & Date:	apnadkarni 2024-04-02 07:35:31
Changes
icomment:
Nathan,

I'll make one last attempt at persuasion that

- from an architectural point of view, generating I/O callback events without
knowing channel state is not at all an appropriate model for async event driven
I/O.

- The related changes in trunk vis-a-vis 8.6 are broken in multiple aspects,
mostly because of the above.

The motivation for async i/o is that applications can do useful work while
waiting for I/O to complete. Now this by itself does not need select on Unix,
completion ports on Windows or fileevent in Tcl. One can just call the i/o
functions and check for "EAGAIN" equivalents. There are multiple problems with
this as the application has to poll:

- Try too frequently and its wasting processing cycles processing EAGAINS.

- Try too infrequently and there is an unnecessary delay / latency in servicing
I/O requests.

An event system solves both the above **as long as events are generated based on
the I/O state.** No time is wasted in unnecessary polls and there is not
additional delay once the channel is ready for I/O. However, if I/O events are
generated based on timers **with no knowledge of I/O state** it has exactly the
same effect as application polling! It is completely pointless - generating
events when I/O state does not reflect channel readiness, and unnecessary delays
after readiness before the timer expires. If you truly believe this as a
solution, you should be amenable to completely getting rid of Tcl's I/O related
event subsystem! The application could just generate write events using `chan
postevent` on a regular basis with `after`. This is basically what the current
Tcl 9 implementation does, queueing events on a timer basis.

The above is the motivation for async/event-driven operation from an
efficiency and performance perspective. However, there is also the "simple"
matter of correctness. An event should not be generated prematurely
reflecting a state that does not actually exist.

In other words, from both the performance and correctness point of view, the
current timer based write event generation in ChannelTimerProc **which is not
based on channel state** is fundamentally flawed. It does not fulfil the
intended purpose of an event based i/o system by essentially polling, and
moreover does not meet correctness criteria as it has no idea of channel state
and generates events prematurely (14.11 failure).

All the above is a comment on event driven i/o and channels in general. Now as
far as refchans are concerned, there is a limitation in the refchan framework as
mentioned in an earlier post which led to the original defect you logged in this
ticket. To reiterate, there is no script level equivalent of the
`Tcl_EventSetupProc` and `Tcl_EventCheckProc`. Thus *some*, not all, refchans
are forced to use a timer base scheme to generate these events. However, **this
must be done by the refchan script implementation itself and not the generic
channel infrastructure** because the former knows the channel state, the latter
does not. This is still not ideal from the efficiency perspective, but being
able to check state, it is at least correct.

I believe that is how Andreas' virtchannel modules in tcllib work. And as proof
of concept, following the tcllib model, I've modified your refchan
implementation of 44.6 and attached as refchan-async-redux.tcl (proof of concept
only modeling tcllib). This works in Tcl 8.6 as well (which your version did
not). TL;DR the changes you made to the core in ChannelTimerProc (a) lead to at
least two bugs 11.14 and event q starvation, (b) affected channels other than
refchans, (c) were unnecessary.

Given (imho) the enhancement of the refchan framework as too much of a risk for
a 9.0 release, I see two possibilities that are acceptable (not perfect, just
acceptable) for 9.0 release:

- revert the implementation to what 8.6 does. No need for -buffering none in
this case but the script level refchan implementation has to generate timer
events, **check state** and then do a `after idle after 0 chan postevent` from
the timer callback. See tcllib or attached sample. Alternatively, the refchan
can do the -buffering none itself and avoid the timer if that suits its purpose.

- If you do not want the channel script implementation to have that responsibility,
(I would like to know why not) then set -buffering none for refchans as
proposed in my branch.

I prefer (1).

I am pretty much going to stay silent on this topic now. I cannot provide any
more clarity on my objections. Finally, some group of people has to decide
on a course of action. Hopefully, that group is not just you and me.
login: "apnadkarni"
mimetype: "text/x-markdown"