2024-11-14
| ||
21:00 | • Ticket [3968086595] Basic nonblocking write-only refchan hangs status still Closed with 6 other changes artifact: 2fd6bb79d8 user: pooryorick | |
2024-11-12
| ||
20:21 | • Closed ticket [3968086595]. artifact: b48fddb291 user: jan.nijtmans | |
2024-08-15
| ||
11:08 | • Ticket [3968086595]: 3 changes artifact: 8b3f2f63cd user: pooryorick | |
2024-08-14
| ||
11:39 | • Ticket [3968086595]: 3 changes artifact: 2b825d680b user: apnadkarni | |
11:32 | • Ticket [3968086595]: 3 changes artifact: 91081cf2ba user: apnadkarni | |
11:18 | • Ticket [3968086595]: 3 changes artifact: ca7b689d1e user: pooryorick | |
11:14 | • Ticket [3968086595]: 3 changes artifact: b04022b2fc user: pooryorick | |
10:50 | • Ticket [3968086595]: 3 changes artifact: 7541238afe user: apnadkarni | |
07:52 | • Ticket [3968086595]: 3 changes artifact: 78ee3360cc user: jan.nijtmans | |
2024-08-13
| ||
17:47 | • Ticket [3968086595]: 3 changes artifact: 880be1cd5b user: pooryorick | |
16:29 | • Ticket [3968086595]: 3 changes artifact: a08237b583 user: jan.nijtmans | |
15:51 | • Ticket [3968086595]: 3 changes artifact: 494db707f5 user: pooryorick | |
15:50 | • Ticket [3968086595]: 3 changes artifact: b3e44909b7 user: pooryorick | |
15:39 | • Open ticket [3968086595]. artifact: 29a0113491 user: apnadkarni | |
15:35 | • Ticket [3968086595]: 3 changes artifact: 676054efed user: apnadkarni | |
15:25 | • Ticket [3968086595]: 3 changes artifact: cd8d19b5e0 user: jan.nijtmans | |
14:55 | • Ticket [3968086595]: 3 changes artifact: 6d3ed0bf04 user: jan.nijtmans | |
14:18 | • Ticket [3968086595]: 3 changes artifact: 554efca484 user: pooryorick | |
14:03 | • Ticket [3968086595]: 3 changes artifact: adef465221 user: jan.nijtmans | |
13:47 | • Pending ticket [3968086595]. artifact: 08f15f7f32 user: pooryorick | |
13:45 | Fix for [39680865953cce4f], Basic nonblocking write-only refchan hangs. Closed-Leaf check-in: ab87ef464e user: pooryorick tags: bug-3968086595 | |
11:27 | • New ticket [3968086595] Basic nonblocking write-only refchan hangs. artifact: 813777d4df user: pooryorick | |
Ticket UUID: | 39680865953cce4fa7187d3613ff7707302d31a6 | |||
Title: | Basic nonblocking write-only refchan hangs | |||
Type: | Bug | Version: | ||
Submitter: | pooryorick | Created on: | 2024-08-13 11:27:35 | |
Subsystem: | - New Builtin Commands | Assigned To: | nobody | |
Priority: | 5 Medium | Severity: | Minor | |
Status: | Closed | Last Modified: | 2024-11-14 21:00:48 | |
Resolution: | Accepted | Closed By: | pooryorick | |
Closed on: | 2024-11-14 21:00:48 | |||
Description: |
The following script should progressively report that it is writing to the channel, and eventually complete. Instead it hangs after the first write to the channel:
| |||
User Comments: |
pooryorick added on 2024-11-14 21:00:48:
This ticket is separate from that techinical note, and illustrates a real reproducible problem. Therefore it should remain open until fixed. jan.nijtmans added on 2024-11-12 20:21:05: Since the TCT, in the online meeting from 11-11-2024, decided to close all tickets related to this Technical Noted, and also decided to no longer allow relicensed code in Tcl official repositories (See: [560b27abdac4abe3]), this ticket can be closed. Thanks, Ashok, for providing a working versions of Nathan's non-working version, and pointing out the reason. pooryorick added on 2024-08-15 11:08:58: This fix isn't about anything which would require a TIP. It's only about making sure an event posted by a refchan doesn't get swallowed. apnadkarni added on 2024-08-14 11:39:38: Nathan wrote: Implementing a continuously-self-scheduling timer at the script level is a bad hack to overcome a bad implementation. This should be fixed. By all means, TIP and implement improvements. apnadkarni added on 2024-08-14 11:32:47: Nope. See the delay at the top which allows control of how often the event is scheduled. It's 0 because your code had it as 0. In general, if a channel is capable of continuously sinking data, it is up to the channel to decide how often to generate events. In your original C implementation, it was continuously generating events (a) IRRESPECTIVE of channel state without giving the refchan control of (b) whether it needed to or not and (c) without giving a chance for idle handlers to run (the last of which you *may* have fixed). Now, perhaps your latest code is doing something in addition. I dunno. As I said, I don't have time to review for 9.0, and think it is risky at this stage to include without a review. The crashes in CI are proof as to why, whether you fix them or not. If someone else can review and finds it worthy, fine. Else 9.0.1. And your mention of the time you have spent correcting missteps is exactly why significant changes need to be reviewed before commits! pooryorick added on 2024-08-14 11:18:25: To summarize: Implementing a continuously-self-scheduling timer at the script level is a bad hack to overcome a bad implementation. This should be fixed. pooryorick added on 2024-08-14 11:14:40: Ashok, the "working" version of the implementation that you just proposed is exactly the strategy you are advocating against: Continuous event posting on a timer that just keeps rescheduling itself. In contrast, the fix that I produced at the C level does not do this. Instead, it just takes care to propagate the signal that is sent from the refchan by not letting it get swallowed just because an internal buffer got in the way. The issue that is illustrated in this ticket is that the C implementation currently swallows that signal. It's entirely possible to fix this at the C level so that postevent signals sent by a refchan do not get swallowed. I could also complain about "orders of magnitude" of time that I've spent over the years correcting missteps that have been made in Tcl (ahem, the alphabet for strings for Tcl 9, or the numerous bugs that I make sure to diagnose and fix before even reporting them so that others don't have to wast their time on them), but I'm not going to engage in such bellyaching. apnadkarni added on 2024-08-14 10:50:26: Nathan, a working version of your channel implementation is below but first... Nathan wrote: This fix is not the same as those that were previously argued, and the example that demonstrates the issue is also not the same. The previous arguments were about 1) whether an asynchronous channel should immediately signal writable when a channel is opened, and 2) whether postevent is asynchronous. Not so. While 2) was a matter of requiring a TIP for incompatible changes to a public API where it could be discussed, 1) was only a symptom of a larger issue - continuous generation of "write" events without knowledge of actual channel state, something which completely defeats the purpose of event-based I/O turning it into a disguised polling model. The commit under discussion raises the same fears pending a thorough review, leave aside the question of whether those changes are even needed. Nathan wrote: The core team should take care not to become a bottleneck to needed contributions. While there will be some delay, I am afraid the time wasted this year chasing down and reverting or fixing commits that made it in without review is an order of magnitude higher. Getting to your sample script, below is a working version of a functionally similar refchan. The implementation is based on Aku's original paper. Note the refchan implementation may differ but I believe the functionality and user test code is the same as your refchan. Feel free to check that but my guess is that your refchan implementation has a bug or is incomplete as is the case in the two tickets you insist on reopening.
For explanation of above, please see Aku's paper. jan.nijtmans added on 2024-08-14 07:52:59: You are right, it's not a "risk", it's an actual failure: https://github.com/tcltk/tcl/actions/runs/10383146736/job/28747514350 chanio.test Test file error: child killed: segmentation violation ... io.test Test file error: child killed: segmentation violation (I took the freedom to start a CI build of your branch) For all other questions you might want to ask, you can find the answer here (there are some additions by Andreas Kupries at the end of this document) pooryorick added on 2024-08-13 17:47:50: Jan, can you provide some details on what you think the risk is? jan.nijtmans added on 2024-08-13 16:29:49: In my judgement, the risk putting this into 9.0 is higher than the benefit. I'll bring this up next TTT meeting, I will report the decision here. pooryorick added on 2024-08-13 15:51:41: The core team should take care not to become a bottleneck to needed contributions. pooryorick added on 2024-08-13 15:50:38: This fix is not the same as those that were previously argued, and the example that demonstrates the issue is also not the same. The previous arguments were about 1) whether an asynchronous channel should immediately signal writable when a channel is opened, and 2) whether postevent is asynchronous. This demo does not intersect with those concerns, and this fix touches on neither of those issues. It only resolves the issue reported in this ticket. If someone doesn't raise a specific objection to this particular fix, it should be merged into trunk. apnadkarni added on 2024-08-13 15:39:42: Just so we do not go back and forth on this, I am not going to argue whether this is a bug or the fix is appropriate if so, and so on. I'm simply stating my opinion that this is to be revisited post 9.0 release. Also marking the bug open accordingly so it does not get lost. apnadkarni added on 2024-08-13 15:35:13: I only had a quick look as there are other higher priorities. At first glance, this is on the same lines as the proposed "fix" in earlier tickets that was reverted and my original objections to a polling model pretending to be event-driven are still likely to stand pending a closer look. Also, given the earlier "bug" reports that I still deem to be user programming errors, the test case in the report also needs to be reviewed for correctness. Again, something I can't spend time on until 9.0 release. Given where 9.0 is in the release cycle, even if this was a genuine bug, it is not important enough to look into for 9.0 relative to other issues so my opinion is to punt even investigating it till 9.0.1. @pooryorick, regarding "since when every change needs TCT review", I don't think that is being done. It so happens your change is essentially the same, or very close to, your earlier commits that were reverted for reasons discussed ad nauseum. When it was collectively agreed amongst 7-8 folks (with you being the sole exception) to revert those changes, it was only right of Jan to put them on a branch pending a review, whenever that might be. jan.nijtmans added on 2024-08-13 15:25:55: What "Version"? Do 8.6 (and 8.7) have the same problem too? jan.nijtmans added on 2024-08-13 14:55:22: @ashok, @sergey, your comments please? pooryorick added on 2024-08-13 14:18:13: When was it decided that TCT members review each change? That was never the case before. jan.nijtmans added on 2024-08-13 14:03:32: Moved to bug-3968086595 branch, so it can be properly reviewed (by a TCT member), before being merged to "trunk" and "core-8-branch". pooryorick added on 2024-08-13 13:47:40: Fixed in [ab87ef464e12c190]. |