TOC |
|
This document describes the issued involved in writing a transformation channel using the new ability to stack channels upon each other introduced in Tcl 8.2
TOC |
TOC |
Transformation channels are based on the new feature/ability of the Tcl core (since version 8.2) to stack a channel upon any other arbitrary channel, i.e. allowing a channel to take over the identity of the other. Doing so causes the transformation channel to intercept all read/write calls directed at the original channel and to modify the data in them at will. Sometimes they are called 'Filter channels' too.
As transformations are just channels and not something new it is not necessary to learn new data structures to write them, only basic knowledge on how to write a new channel driver. The information needed to write a new channel driver is explained here.
TOC |
The core provides 3 new procedures in the C-API,
which will explained later in more detail.
For now it is important to know that the following guarantees are made by the core with respect to stacked channels:
TOC |
Whenever a channel is created the core will not only get a reference to the structure containing the references to the driver procedures, i.e. the channel type, but a reference to a structure allocated by the caller as well. This reference is given to all driver procedures when called for that particular channel. The internals of this structure are known only to the channel driver; the core will just pass the reference around. This allows us to associate the specific state of the driver with the channel.
In the case of a transformation the following information should be present, not necessarily under the name I gave them, you are free to choose your own identifiers:
- Tcl_Channel channel;
- is a backlink from the instanceData to the channel. Without this link the transformation will be unable to access the downstream channel, which really makes no sense for it. Please note that due to the guarantee in Facilities in the core this reference will always point to the top of the stack and not necessarily to the channel structure referring to this instanceData. This means that anytime the driver wants to access the downstream channel of the transformation it has to walk the chain of channels to find its own channel. Not doing this will cause an infinite recursion later on. This is best done in a utility function employing 'Tcl_GetStackedChannel'. See for example 'generic/giot.c:DownChannel'.
- Tcl_TimerToken timer;
- This timer is necessary if the transformation is able to buffer processed data the generic I/O layer has no knowledge of. It will be used to flush out such data. See WatchProc for more explanations.
- int flags;
- The transformation has to remember the current blocking mode to handle EOF on input right. See InputProc for more explanations.
- int mask;
- The transformation has to remember the current interest in events; see WatchProc for more explanations.
As with every channel a transformation has to have a creation command at the tcl level and an equivalent procedure at the C-level.
In contrast to other channel types this procedure must not use 'Tcl_CreateChannel' as that would create a new and separate channel, but 'Tcl_StackChannel' instead. This procedure takes as one of its parameters a reference to an existing channel and creates a new channel structure which is used to save the current state of the channel we are stacking the transformation onto. The old structure is then reinitialized for the transformation. The returned value refers to the saved state of the old channel. The advantage of this design is that all references to the channel we are stacking onto stay valid without opening backdoors into the whole stack.
Other things, like the creation and initialization of the necessary clientData for the transformation, have to be done as usual.
Please note that the backlink to the channel of the transformation must be initialized before the call to 'Tcl_StackChannel' as it is required for the WatchProc implementation and that one can be called from the inside of 'Tcl_StackChannel'. This restriction is no problem, as we know from Facilities in the core that the channel structure of the transformation is the same as the old channel.
Creation procedure skeleton
int XX_CreateTransformation (interp, objc, objv, cd) { old_channel = find (handle (objv [1])); clientData = Tcl_Alloc (...); initialize clientData... clientData->channel = old_channel; save_area = Tcl_StackChannel (interp, &trans_type, clientData, old_channel); interp->SetResult (old_channel->name); return TCL_OK; }
The 'trans_type' in the code above is the structure containing the references to the driver procedures for the transformation.
The two ways of destroying a transformation are 'Tcl_CloseChannel' and 'Tcl_UnstackChannel'.
The difference between them ? 'Tcl_CloseChannel' will not only destroy the transformation, but the underlying channel too, an action which then ripples through the whole stack until all channels in it are destroyed. 'Tcl_UnstackChannel' on the other hand will only destroy the topmost transformation in the stack. Of course, if the stack consisted only of one element both procedures are equivalent.
As 'Tcl_CloseChannel' knows that the whole stack of channels is in destruction it does not deal with events anymore, except for destroying the internal data structures supposed to deal with them. But it does ask the various channels in the stack to flush buffered information down the stack so that nothing which is stuck is lost. This is not possible for information in the upward/read buffers, as there is no ultimate receiver for them, so these bytes are lost.
'Tcl_UnstackChannel' does the same as above, but also takes action to keep the event-system up and running (by redirecting existing scripts to the uncovered channel).
In all cases the system will call the 'CloseProc' of the transformation to cleanup the data structures of the destroyed channels.
A procedure to make 'Tcl_UnstackChannel' available at the script level is rather simple:
Unstack skeleton
int XX_UnstackTransformation (interp, objc, objv, cd) { channel = Tcl_GetChannel (interp, Tcl_GetString (objv [1]), &mode); Tcl_UnstackChannel (channel); return TCL_OK; }
It was said in InstanceData that the transformation has to walk the stack to find its own channel structure, and that of the channel downstream, but not how this is done.
The functionality for this is provided by the third new function in the C-API, 'Tcl_GetStackedChannel'. It takes a channel token and returns a token for the channel immediately downstream. A (Tcl_Channel) NULL indicates that the channel used as argument was at the bottom of the stack.
Now that the environment of the driver is a little more known we can explain the operations of the various driver procedures in detail. Every description will start with the general condition under which the procedure is called by the generic I/O layer of the tcl core and proceeds to the specialities a transformation has to take care of.
This procedure is called by the C-API function 'Tcl_GetChannelHandle' to retrieve the OS specific file handle associated to the queried channel.
Transformations are not associated with OS specific handles, making this call a bit meaningless. So one possible implementation simply returns a NULL handle (erroring out is not possible).
On the other hand the downstream channel could be OS specific and have such a handle. So the other implementation is to delegate that request to the underlying channel, via 'Tcl_GetChannelHandle'. If all transformations cooperate in this manner, the request will fall down to the channel at the bottom, like a file or socket, which is able to handle it and return something meaningful.
To me the second implementation is the better policy of the two described here.
This procedure is called by the generic I/O layer whenever 'Tcl_SetChannelOption' is used (f.e. by 'fconfigure') and a non-standard option was specified as argument.
The interesting case here is how to handle options unknown to the transformation. One could discard them, or delegate the call to the downstream channel. I recommend delegating the call.
SetOptionProc skeleton
static int SetOptionProc (clientData, interp, optionName, value) { ... handle your own options /* delegate unknown options downstream */ return Tcl_SetChannelOption (interp, clientData->channel->supercedes, optionName, value); }
This procedure is called by the generic I/O layer whenever 'Tcl_GetChannelOption' is used (f.e. by 'fconfigure') to query the value of a non-standard (or all) option(s).
Implement everything from the SetOptionProc, probably more (a read-only option makes sense, write-only doesn't). Again it is possible to delegate unknown options to the underlying channel. In the case of a query for all options, delegation will generate a mighty long result. Pruning the unnecessary options values from the result of the underlying channel (-encoding, -buffering, -translation) is possible, but tedious (We have a DString, not Tcl_Obj'ects, especially no ListObj'ects). The named options have fixed and known values, due to the behaviour of 'Tcl_StackChannel' as described in Creation of a transformation.
GetOptionProc skeleton
static int GetOptionProc (clientData, interp, optionName, dsPtr) { ... handle your own options. /* delegate unknown options and full queries downstream */ return Tcl_GetChannelOption (interp, clientData->channel->supercedes optionName, dsPtr); }
This procedure is called by the generic I/O layer whenever the user asks the channel to move the 'file access point'.
This is a hard problem for transformations. Should they seek using their own notion of access point ? Or should they use the notion of the underlying channel and then try to adapt their own state for fine-positioning?
Depending on the transformation both can be impossible. A nice example are compressors (like zlib), with their completely non-linear and position-dependent relationship between the number of bytes coming in from the downstream channel and going out to its caller. Another reason could be that the transformation state is not reversible, i.e. cannot be rolled back in a simple way, without hogging memory. An example for this would be an encryption transformation using a cryptographically strong hash-function to go from the current state to the state for the encryption of the next byte (or block). This is not reversible. We can go forward from state to state, but not back to the old state, except for saving them all.
Because of this my personal policy to deal with seeking is to propagate the request unchanged to the underlying channel and to discard all information in the internal buffers of the transformation. Data waiting to be written is converted as if they are the last block, in other words the special end of information processing is applied, and then flushed. The current state is abandoned too.
The next call to InputProc or OutputProc will be handled as if it were the first call to the transformation.
This is basically a 'The user knows best, is able to compute a place making sense and not creating garbage during recover'.
In the end this means that the user of a certain transformation has to understand its properties and whether a seek on it makes sense at all.
Oh, it is possible to deal even with non-reversible state, by recording all read/write calls and maintaining an exact image of the information read/written so far, but this is, ah, memory-extensive, to understate this a little.
SeekProc skeleton
static int SeekProc (clientdata, offset, mode, errorCodePtr) { ... flush waiting output ... flush waiting input, if possible (f.e. into a configured variable!) /* Chain the call */ result = Tcl_Seek ( clientdata->channel->supercedes, offset, mode ); *errorCodePtr = (result == -1) ? Tcl_GetErrno ():0; return result; }
This procedure is called by generic the I/O layer whenever the user (re)configures the value of option '-blocking'. For a normal channel this procedure has to take the necessary actions at OS level to switch the managed channel into (non-)blocking behaviour.
For a transformation however the behaviour is pretty much fixed. It has to
None of these can be left out because
BlockModeProc skeleton
static int BlockModeProc (clientdata, mode) { if (mode == TCL_MODE_NONBLOCKING) { clientdata->flags |= ASYNC; } else { clientdata->flags &= ~ASYNC; } return Tcl_SetChannelOption( NULL /* ignore interp arg */, clientdata->channel->supercedes, "-blocking", (mode == TCL_MODE_NONBLOCKING) ? "0" : "1"); }
This procedure is called by generic the I/O layer to tell a channel that it is about to be destroyed. It is the responsibility of the procedure to clean up any data structures held by the channel.
A transformation has the additional responsibility to complete the conversion of all incomplete information sitting in its internal write buffers and to write the result into the downstream channel.
Another task is the removal of all event related things, like ChannelHandlers and Timers, although this could be billeted under 'clean up of any data structures held by the channel' too.
CloseProc skeleton
int CloseProc (clientdata, interp) { ... delete channelhandler at underlying channel, if any. ... delete timer, if any. See 'WatchProc' too. ... do last minute conversions on r/w/ buffers and try to flush their results to the underlying channel. ... Tcl_Write (clientdata->channel); /* (x) */ ... free data structures on the heap. return TCL_OK }
Remark (x): At the time of the call the backlink to the transformation channel actually already refers to the downstream channel instead, which is now at the top.
This procedure is called by the generic I/O layer whenever some input is required.
Instead of going to the OS for this a transformation has to ask the channel downstream for data to convert. Use 'Tcl_Read' to do this, not 'Tcl_Gets'. The channel downstream does no EOL-translation (See Facilities in the core) and so has basically has no notion of lines. 'Tcl_Gets' might trip on that, unnecessarily waiting for the end of a line, or EOF, although there is data which could be converted.
Instead of a skeleton which would be overwhelming despite trimming down I list the rules my input procedures are based upon. See 'generic/giot.c:TransformInputProc'for a real-life example.
Other things to consider:
This procedure is called by the generic I/O layer whenever something is written to the transformation and an I/O buffer is flushed.
The transformation has to convert as much as possible of the data it got, and the result must be written to the channel downstream (Well, not really, but not writing it does not make much sense).
Data which cannot be converted at once has to be buffered and converted later, together with the data written by the next calls to the transformation.
The API to use here is 'Tcl_Write'.
As with InputProc this procedure is free to read from the underlying channel too, or from some other channel, or ...
This procedure is called by the generic I/O layer whenever the user (or the system) announces its (dis)interest in events on the channel.
The correct behaviour of the transformation here is essential for a smooth interoperation with the notifier and for working fileevents.
Whenever it is called it has to add or delete a Channelhandler in the channel downstream, using the given mask. This propagates the interest in events down the stack until the channel at the bottom takes the necessary steps to actually generate these events.
From Facilities in the core we know that events generated by the channel at the bottom are delivered to all channels in the stack, with the topmost channel the first to be notified. This will call all of the installed ChannelHandlers.
They have to check the internal read buffer for pending information and to set up a timer if there is some. The reason for this will be explained later in this section.
To complete the action, the Eventhandler for the timer has to call 'Tcl_NotifyChannel' with the mask for 'READABLE', in other words it has to generate an artificial 'readable' event. This in turn will invoke any 'fileevent' script and all the ChannelHandlers, causing another check for the necessity of setting up more timers.
WatchProc skeleton
static void WatchProc (clientdata, mask) { if (mask == clientdata->mask) { return; /* no changes */ } if (clientdata->mask) { /* remove old channel handler */ Tcl_DeleteChannelHandler ( clientdata->channel->supercedes, ChannelHandler, clientdata ); } clientdata->mask = mask; if (clientdata->mask) { /* recreate channel handler */ Tcl_CreateChannelHandler ( clientdata->channel->supercedes, clientdata->mask, ChannelHandler, clientdata ); } }
ChannelHandler skeleton
static void ChannelHandler (clientdata, mask) { if (clientdata->timer != NULL) { /* cancel outstanding timer */ Tcl_DeleteTimerHandler (clientdata->timer); clientdata->timer = NULL; } if ((mask & TCL_READABLE) && clientdata->ReadBufferContainsDeliverableInformation()) { clientdata->timer = Tcl_CreateTimerHandler (DELAY, ChannelHandlerTimer, clientdata ); } }
Timer handler skeleton
static void ChannelHandlerTimer (clientData) { /* Timer event, reset token */ clientdata->timer = NULL; /* Handle like normal events */ Tcl_NotifyChannel (clientData->channel, TCL_READABLE); }
Now the promised explanation about the necessity of timers.
Consider this scenario:
The solution is the same one used by the generic layer, but from the inside of the transformation this time:
The transformation has to check itself for data waiting to be read and then use a timer to generate the necessary 'readable' events. And that is what the construction of channel handlers and timers described above will do.
TOC |
TOC |
Andreas Kupries | |
Andreas Computer Laboratories (Me, myself and I) | |
Kongress-Str. 23/15 | |
Aachen, NRW 52070 | |
DE | |
Phone: | +49 241 514 998 |
EMail: | [email protected] |
TOC |
A little glossary of terms used in the paper, but so far without much of an explanation (or none).
- Tcl_Channel:
- An opaque token for channels, and used by all interfaces accessing channels. Internally it is a pointer to the relevant data structures (Channel*).
- stack:
- If one or more transformations are stacked upon an arbitrary other channel I use this word to refer to the whole group of channels.
- (un)cover:
- Placing a transformation on a channel C "covers" C, removing the transformation "uncovers" it again.
TOC |
This HOWTO was written in XML using the DTD developed by Marshall T. Rose for writing RFC's and I-D's, see RFC 2629, and converted to text and HTML with his tool, 'xml2rfc'.
The implementation of a stacked transformation was heavily influenced by Jan Nijtmans who suggested the basic idea of using a plain channel to encapsulate them instead of going deep into the buffering system of the generic I/O layer.
Matt Newman based his SSL/TLS binding on transformations, a valuable addition whose use by tclHTTPd made the inclusion into the core possible.
My thanks to Larry Virden for proof-reading this document.