Tcl Source Code

View Ticket
Login
Bounty program for improvements to Tcl and certain Tcl packages.
Tcl 2019 Conference, Houston/TX, US, Nov 4-8
Send your abstracts to [email protected]
or submit via the online form by Sep 9.
Ticket UUID: 2826430
Title: Channel name recycling is dangerous
Type: Bug Version: None
Submitter: coldstore Created on: 2009-07-24 07:42:41
Subsystem: 24. Channel Commands Assigned To: andreas_kupries
Priority: 7 High Severity:
Status: Open Last Modified: 2009-09-01 17:26:53
Resolution: None Closed By:
    Closed on:
Description:
I would like a [chan rename $oldname $newname] command, which renamed a channel.

This would be useful to avoid bugs (admittedly, caused by other bugs) where an open channel name is stored in one data structure, perhaps in a coro, thread or interp, and the channel is closed before the channel name can be removed from those data structures.  The current effect of such uncoordinated storage is that either a delay in closing the channel must be suffered while all copies of the name are updated, or that a stale chan name is held somewhere, and causes a new chan (usually a socket) with the same name as the old (and now closed) chan to be manipulated in error.

This problem arises, essentially, because tcl reuses chan names frequently, and chooses them from a very compact set.

By enabling arbitrarily named chans, at construction time, chan name conflict could be completely avoided by an application.  As a stopgap, synthetic channels can be constructed with unique names, but this is an extra level of indirection.
User Comments: ferrieux added on 2009-09-01 17:26:53:
Submitted as TIP 355.

coldstore added on 2009-07-26 07:03:15:
FWIW: I reviewed it, loaded it, compiled it, have run it with a simple test (open and close a bunch of files, observe that the filename changes as expected, and that the various std* files are still there.)

Looks fine, seems efficacious.

for {set i 1} {$i < 100} {incr i} {
    set fd [open [info script] r]
    puts "$fd [chan names]"
    close $fd
}

ferrieux added on 2009-07-26 06:40:19:
File Added - 336599: uniq.patch

ferrieux added on 2009-07-26 06:39:58:
Attaching a patch doing just that. The affected classes of channels are file%d, serial%d, and sock%d.

(patch untested, maybe you can test it before me)

dkf added on 2009-07-25 19:15:35:
I'd go for Tcl_GetProcessUniqueNum as a name; the extra letters don't cost the code anything, yet they make it clear what it is you're getting. Implementation looks fine.

ferrieux added on 2009-07-25 17:32:02:
Oh yes you're right, I forgot channel transfer. Of course the counter must be process-global, then. The ProcessGlobal API in tclUtil.c looks a bit of an overkill though; I'd simply say

      int Tcl_GetUnique(void)
      {
           int res;
           Tcl_MutexLock(&uniqueLock);
           res=tclUnique++;
           Tcl_MutexUnlock(&uniqueLock);
           return res;
      }

Also, it might be wise to name the routine Tcl_GetUniqueChannelIndex to convey its main purpose, avoiding "dilution" with other more frequent uses, which would accelerate wrapping to MAXUINT.

Re the advanced channel lifetime management you propose to expose to script level, Andreas: I agree, it would be a nifty tool, but it is orthogonal to the "uniqueness reform" I'm proposing here. Basically the reform would kill an entire category of bugs without scripter intervention. Explicit lifetime management does need the programmer's involvement, may be used by some but not all. Why not make it a separate Freq ?

dgp added on 2009-07-25 03:33:10:
The *ProcessGlobal* routines might
be a tool to consider.

andreas_kupries added on 2009-07-25 03:04:16:
Another aspect of the use-case as Colin presented it is, it seems, that if any part of the system is allowed to kill a channel for some reason or other, then the coros using that channel have to have lots of catch'ing to do to detect when a channel was pulled out from under them, do I understand that correctly ? Ditto for the timer ... When it goes off it has to catch the possible error.

IMVHO I would prefer an application architecture/schema where a specific package would be the manager of stuff like this, with each coro calling the necessary procedures, and 'close' would be routed through that as well, to perform the necessary actions, i.e. kill the timer. All the necessary information would be in a namespace, indexed by channel id, and not inaccessible inside of coro variables/state. Now the only issue would be to tell the coro using the channel that it is gone (because events would not get enabled on it again in scheme, with a central manager).

andreas_kupries added on 2009-07-25 02:56:11:
Note our ability to share/transfer channels among slave interps and also to transfer between threads. 

The Tcl_GetUnique() result is only unique per-interp, and now we can get problems with such transfers when two channels get the same id in different interps, and then we want to transfer/share one over.

The existing scheme avoided that because the names were unique per the whole process.
Any new scheme should preserve this.

dkf added on 2009-07-24 22:42:56:
That's a question to ask on tcl-core

ferrieux added on 2009-07-24 21:47:35:
Looking into the implementation, it seems a much simpler method than compact+gencount could work: just increment a global (per-interp) integer, and use it as source of uniqueness:

                    int Tcl_GetUnique(Tcl_Interp *i);  /* returns a different number every time. Wraps in no less than MAXUINT calls. */

The idea of making it a global stub entry is that any extension can use it for its own purposes, and keep the guarantee of not colliding. Is  a TIP mandatory for that or shall I proceed ?

ferrieux added on 2009-07-24 20:29:12:
data_type - 110894

Switched to Bug status and re-titled according to previous exchanges.
Now I have a patch to submit it seems ;-)

dkf added on 2009-07-24 20:03:10:
They use the fd because it's an easy source of local uniqueness.

coldstore added on 2009-07-24 19:46:23:
Well, it certainly can be, so sure.

I had a look, and it seems that every style of chan, serial, socket, file, pipe, except refchan uses the fd as the basis of its name.

ferrieux added on 2009-07-24 18:27:53:
If you accept to rename it to "Channel name recycling is dangerous", I can switch it to Bug status (without recreating and duplicating comments). Justtell me.

coldstore added on 2009-07-24 17:35:21:
BTW, the existing implementation is not strictly a bug, it's just a really inconvenient side-effect of an implementation choice which can easily cause bugs in applications unless a whole lot of work is done to avoid it.  I'm not sure what you call those.  I've not come across many of these in Tcl, so don't know the appropriate name.

Should I close this and make it a bug report instead?

coldstore added on 2009-07-24 17:28:51:
These are all valid points, and it is certainly a judgement call.  I don't mind which is chosen, but I would very much like either to be implemented, as the problems caused by chan name reuse for sockets are significant to me.

The only real advantage of the approach I suggested is, as previously stated, some additional expressive power.  It does come at a cost, as does its use.

Either way, I'm happy.   The approach suggested by Alexandre is also quite simple to implement - reflected channels already do it (tclIORChan.c:2041 NextHandle function) and requires no additional documentation unless one is feeling kindly toward people who may have relied upon undocumented side-effects of socket naming in the past.

ferrieux added on 2009-07-24 17:10:33:
OK so both (1) and (2) are the same in fact: all of this vanishes if channel names stop being reused.

Now comparing the two approches (yours and mine), I 'd say I perfer mine (surprise, eh ?) because:
 - mine doesn't need extra primitives
 - mine doesn't demand script contribution to avoid bugs
 - yours might break existing scripts: if you rename (not close) a channel after some package has secretly stored its name, then its subsequent use of it will fail while it shouldn't.

coldstore added on 2009-07-24 17:00:55:
I see that I didn't clarify how renaming the channel would fix the problem of distributing the fact of closure to all interested parties.  It would fix the problem by allowing the app to guarantee that a once-valid channel name, once closed, would never again be valid.  So the fact-of-closure would be distributed as a side effect of that guarantee.  If several interested parties might close the chan, the first one to do so indicates it by that act alone.

An additional benefit is that the channel name-space is currently coded, but undocumented.  [chan rename] would document the name-space as initially undefined, subsequently defined.

coldstore added on 2009-07-24 16:55:14:
To clarify, per Alexandre's request:  if you have a coro which is in the process of handling fileevents on a socket (socket10, let's say) and you decide you want to not handle events on that socket for a few seconds, so you schedule an [after] event to trigger to enable you to re-establish the [fileevent $sock] ... you must store the socket name (socket10) in the variable sock, which may be a coro variable, hence inaccessible to other coros.  If a different processing stream, or coro, decides it wants to close socket10, it can't safely do that until and unless it has notified the 
first coro of its intention, so that it can forget about socket10 - cancel any [after] events it may have scheduled.

If this sounds like a contrived example, it's not.  I have almost exactly this problem in adding input throttling to Wub right now, and anyone writing a producer/consumer pair with buffering where the two parts make use of modern Tcl facilities such as coros or [apply] or even [interp] will have similar problems.  The amount of data which needs to 
be shared between the distinct processing elements is too great, and is imposed entirely by the rapid cycling of chan names.

It is true that one could fix the cycling, but the idea of encoding names with significant information also appeals to me, and comes for free with this fix.

coldstore added on 2009-07-24 16:53:38:
I've thought of a new good reason to do this:  by renaming, one could introduce some application-specific encoding into the names, for example "connected_$ipaddress_$count"  and get connection counting for free.  I also thought of a new distinct area of a program which may contain chan names: the event system (fileevents and such)

dkf added on 2009-07-24 16:39:09:
I'm just making sure that we don't forget to make them special.

ferrieux added on 2009-07-24 16:11:44:
No problem with hard-coding the lookup from std* to compactArrayOfFds[0,1,2] and disabling gencount checks on them.

dkf added on 2009-07-24 15:58:12:
Just a quick note in passing. It's important to keep the 'std*' namespace special.

ferrieux added on 2009-07-24 15:08:08:
Hmmm, there are two separate issues here: (1) removing the carpet under a coro/thread feet and (2) too fast reuse.
I fail to see how this would solve (1), but for (2) it is clear. Can you clarify on (1) ?

Regarding (2), the "compact set" arises from Tcl's direct use of OS handles/fd, and on unix fds are a compact set. On Windows AFAIK handles are not reused (correct m if I'm wrong). So, an alternative way of solving this issue, without any intervention from the script writer, would be to insert an indirection in Tcl, with an intentionally compact set managed by Tcl, but extended with generation counts, so that no reuse occurs:

    typedef struct {
           HANDLE_OR_INT fd;
           int gencount;
    } UniqueHandle; 

   UniqueHandle  *compactArrayOfFds;

The idea, then, is to have automatic channel names of the form

     sock3g1

which can be looked up from string in two steps:

    3 is the index in compactArrayOfFds
    1 is the gencount for verification

This way, even if slot 3 of the array is reused, it will have a gencount of at least 2, hence any stale reference to sock3g1 will generate an error instead of wreaking havoc by unwanted IO on another channel.

Note that this works well with the Channel intrep too, assuming we add the gencount to the structure. Then the gencount verification is just an extra integer comparison.

Reactions ?

Attachments: