TIP 219: Tcl Channel Reflection API

Login
Author:         Andreas Kupries <[email protected]>
Author:         Andreas Kupries <[email protected]>
State:          Final
Type:           Project
Vote:           Done
Created:        09-Sep-2004
Post-History:   
Tcl-Version:    8.5
Tcl-Ticket:     1025294

Abstract

This document describes an API which reflects the Channel Driver API of the core I/O system up into the Tcl level, for the implementation of channel types in Tcl. It is built on top of [208] ('Add a chan command') and also an independent companion to [230] ('Tcl Channel Transformation Reflection API') and [228] ('Tcl Filesystem Reflection API'). As the later TIPs bring the ability of writing channel transformations and filesystems in Tcl itself into the core so this TIP provides the facilities for the implementation of new channel types in Tcl. This document specifies version 1 of the channel reflection API.

Motivation / Rationale

The purpose of this and the other reflection TIPs is to provide all the facilities required for the creation and usage of wrapped files (= virtual filesystems attached to executables and binary libraries) within the core.

While it is possible to implement and place all the proposed reflectivity in separate and external packages, this however means that the core itself cannot make use of wrapping technology and virtual filesystems to encapsulate and attach its own data and library files to itself. This is something which is desirable as it can make the deployment and embedding of the core easier, due to having less files to deal with, and a higher degree of self-containment.

One possible application of a completely self-contained core library would be, for example, the Tcl browser plugin.

While it is also possible to create a special purpose filesystem and channel driver in the core for this type of thing, it is however my belief that the general purpose framework specified here is a better solution as it will also give users of the core the freedom to experiment with their own ideas, instead of constraining them to what we managed to envision.

Another use for reflected channels was found when creating the reference implementation: As helper for testing the generic I/O system of Tcl, by creating channels which forcibly return errors, bogus data, and the like.

Specification

Introduction

This specification has to address two questions to make the reflection work.

C Level API

Four functions are added to the public C API. See section "Error Handling" for their detailed specification.

Tcl Level API

The Tcl Level API consists of two new subcommands added to the ensemble command chan specified by [208]. The new subcommands are:

Command Handler API

The Tcl-level handler command for a reflected channel is an ensemble that has to support the following subcommands, as listed below. Note that the term ensemble is used to generically describe all command (prefixes) which are able to process subcommands. This TIP is not tied to the recently introduced 'namespace ensemble's.

Of the available methods the handler has to support initialize, finalize, and watch, always. The other methods are optional.

Notes:

Error handling

The current I/O core's ability to handle arbitrary Tcl error messages is very limited. Tcl_DriverGetOptionProc and Tcl_DriverSetOptionProc are the only driver functions for which this is possible directly. Everywhere else the API is restricted to returning POSIX error codes.

This limitation makes the debugging of problems in a channel command handler at least very difficult. As such it is considered not acceptable. It is proposed to solve this problem through the addition of four new functions to Tcl's public stub table.

void Tcl_SetChannelError(Tcl_Channel chan, Tcl_Obj* msg)

void Tcl_SetChannelErrorInterp(Tcl_Interp* ip, Tcl_Obj* msg)

These functions store error information in a channel or interpreter. Previously stored information will be discarded. They have to be used by channel drivers wishing to pass regular Tcl error information to the generic layer of the I/O core.

The refCount of msg is unchanged when the functions had to rewrite msg per the safety precautions explained below, as a properly modified copy of msg is stored, and not msg itself. Otherwise the refCount of msg is incremented by one.

void Tcl_GetChannelError(Tcl_Channel chan, Tcl_Obj** msg)

void Tcl_GetChannelErrorInterp(Tcl_Interp* ip, Tcl_Obj** msg)

These function retrieve error information stored in a channel or interpreter O, and also resets O to have no information stored in it. They will return NULL if no information was stored to begin with.

i.e. After an invocation of Tcl_GetChannelError* for a channel/interpreter object O, all following invocations will return NULL for that object, until an intervening invocation of Tcl_SetChannelError* again stored information in O.

The msg argument is not allowed to be NULL. Nor are the chan and ip arguments.

The refCount of the returned information is not touched. The reference previously held by the channel or interpreter is now held by the caller of the function and it is its responsibility to release that reference when it is done with the object.

This solution is not very elegant, but anything else will require an incompatible redefinition of the whole channel driver structure and of the driver functions.

It should also be noted that usage of Tcl_Objects for the information storage binds the information to a single thread. I.e. a transfer across thread boundaries is not possible. This however is not required here and thus no limitation.

The four functions have been made public as I can imagine that even C level drivers might wish to use this facility to generate more explicit and readable error messages than is provided through POSIX error codes and the errno API.

The information talked about in the API specifications above is not a plain string, but has to be a list of uneven length. The last element will be interpreted as the actual error message in question, and the preceding elements are considered as option/value pairs containing additional information about the error, like the errorCode, etc. I.e. they are an extensible dictionary containing the details of the error beyond the basic message.

As a safety precaution any -level specification submitted by the driver and a non-zero value will be rewritten to a value of 0 to prevent the driver from being able to force the user application into the execution of arbitrary multi-level returns, i.e. from arbitrarily changing the control-flow of the application itself. Analogously any -code specification with a non-zero value which is not error is rewritten to value 1 (i.e. error).

Below a list of driver functions, and which of the _Tcl_SetChannelError*** functions they are allowed to use.

Given the information above the following public functions of the Tcl C API are affected by these changes. I.e. when these functions are called the channel may now contain a stored arbitrary error message requiring processing by the caller.

All other API functions are unchanged. Especially the functions below leave all their error information in the interpreter result.

A previous revision of this TIP specified only two functions, storing the data only in channels. This however proved to be inadequate. It allows the transfer of messages for most driver functions, but not close. Storing an error message in the channel structure which is destroyed is not helpful. So we need the functions for storing data in interpreters. Conversely, providing only two functions storing the information in an interpreter, is inadequate as well. The circumstances for that to happen are actually very limited, but they can happen. First, most driver functions are not given an interpreter reference when called, and actually do not know which interpreter caused their invocation. The only remedy we have is that the channel structure has to have an interpreter reference to the interpreter of the command handler, for the calls into the Tcl level. This could be used in most circumstances, except when threads are enabled and the channel was transfered out of the thread containing that interpreter. We are not allowed to use this interpreter from the channel thread, and again have no other reference available. So for this the code/message pair has to be stored in a channel as the sole place available.

A previous revision of this TIP not only stored an error message, but also a result code in the channel or interpreter, and used it as the return code of the Tcl command which invoked the driver function returning the exception. This feature has been discarded as a possible security hazard. It would allow a malicious Tcl driver to cause break and continue exceptions at arbitrary locations in the overall application, controlling its behaviour as it sees fit.

I wish to thank Joe English and Vince Darley for their input with regard to the limitations of error propagation in the I/O core and possible ideas for solving it. Joe's discourse on the problems with the use of POSIX error codes in an earlier revision of this TIP made me realize that I should not use them anywhere in the API for reflected channels and rather concentrate on extending the I/O system to properly receive Tcl error messages. And while I rejected the TclSetPosixError function Vince proposed I hopefully kept the spirit of that proposal in my solution as well. The main reason against setting an arbitrary posix error string was that it invented another way of passing error information around, whereas the specification above is based on the existing Tcl_InterpState and attendant functionality.

Interaction with Threads and Other Interpreters.

A channel created with the chan create command knows the interpreter it was created in and executes its handler command only in that interpreter, even if the channel is shared with and/or has been moved into a different interpreter. This is easy to accomplish, by evaluating the handler command only in the context of the original interpreter.

The channel also knows the thread it was created in and executes its handler command only in that thread, even if the channel has been moved into a different thread. This is not so easy to accomplish, but still possible and feasible. It is done by:

Note that this also allows the creation of a channel whose two endpoints live in two different threads and provide a stream-oriented bridge between these threads. In other words we can provide a way for regular stream communication between threads instead of having to send commands.

When a thread or interpreter is deleted all channels created with the chan create command using this thread/interpreter as their computing base will be deleted as well, in all interpreters they have been shared with or moved into, and in whatever thread they have been moved to. This pulls the rug out under the other thread(s) and/or interpreter(s), this however cannot be avoided. Trying to use such a channel will cause the generation of the regular error about unknown channel handles.

Interaction with Safe Interpreters

The new subcommands create and postevent of chan are safe and therefore made accessible to safe interpreters.

While create arranges for the execution of code this code is always executed within the safe interpreter, even if the channel was moved (See previous section).

The subcommand postevent can trigger the execution of fileevent handlers, however if they are executed in trusted interpreters then they were registered by these interpreters as well. (Moving channels between threads strips fileevent handlers, and just between interpreters keeps them, and executes them where they were added).

Early versus Late Binding of the Handler Command

We have two principal methods for using the handler command. These are called early and late binding.

Early binding means that the command implementation to use is determined at the time of the creation of the channel, i.e. when chan create is executed, before any methods are called. Afterward it cannot change. The result of the command resolution is stored internally and used until the channel is destroyed. Renaming the handler command has no effect. In other words, the system will automatically call the command under the new name. The destruction of the handler command is intercepted and causes the channel to close as well.

Late binding means that the handler command is stored internally essentially as a string, and this string is mapped to the implementation to use for each and every call to a method of the handler. Renaming the command, or destroying it means that the next call of a handler method will fail, causing the higher level channel command to fail as well. Depending on the method the error message may not be able to explain the reason of that failure.

Another problem with this approach is that the context for the resolution of the command name has to be specified explicitly to avoid problems with relative names. Early binding resolves once, in the context of the chan create call. Late binding performs resolution anywhere where channel commands like puts, gets, etc. are called, i.e. in a random context. To prevent problems with different commands of the same name in several namespaces it becomes necessary to force the usage of a specific fixed context for the resolution. The only context suitable for such is the global context (per uplevel #0, not namespace eval ::).

Note that moving a different command into place after renaming the original handler allows the Tcl level to change the implementation dynamically at runtime. This however is not really an advantage over early binding as the early bound command can be written such that it delegates to the actual implementation, and that can then be changed dynamically as well.

However, despite all this late binding is so far the method of choice for the implementation of callbacks, be they in Tcl, or Tk; and has been chosen for the reflection as well.

Miscellanea

The channel reflection API reserves the driver type "tclrchannel" for itself. Usage of this driver type by other channel types is not allowed.

Examples

Driver Implementations

A simple way of implementing new types of channels is to use any of the various object systems for Tcl. Create a class for the channel type. Create the new channel in the constructor for new objects and store the channel handle. Make the new object the command handler for the channel. This automatically translates the sub commands for the command handler into object methods. Implement the various methods required. when the object is deleted close the channel, and delete the object when the channel announces that it has been closed. This part is a bit tricky, flags have to be used to break the potential cycle.

Another possibility is to implement the command handler as a regular command, together with a creation command wrapping around chan create and a backend which keeps track of all handles created by it and their state, associated data, etc.

 object based example ...

  snit::type new_channel {
      constructor {mode args} {
          # Handle args ...
          set chan [chan create $mode $self]
      }
      destructor {
          # ... delete internal state ...
          if {$dead} return
          set dead 1
          close $chan
      }

      method handle {} {return $chan}
      variable chan
      variable dead 0

      method finalize {dummy} {
          if {$dead} return
          set dead 1
          $self destroy
      }
      method initialize {dummy mode} {}
      method read       {dummy count} {}
      method write      {dummy data} {}
      method seek       {dummy offset base} {}
      method configure  {dummy args} {}
      method watch      {dummy events} {}
      method blocking   {dummy isblocking} {}
  }

  proc newchannel_open {args} {
      return [[new_channel %AUTO% {expand}$args] handle]
  }

Other Possible Drivers

Reference Implementation

A reference implementation is provided at SourceForge http://sourceforge.net/support/tracker.php?aid=1025294 .

Comments

[ Add comments on the document here ]

Copyright

This document has been placed in the public domain.