TIP: 189
Title: Tcl Modules
Version: $Revision: 1.13 $
Author: Andreas Kupries <[email protected]>
Author: Jean-Claude Wippler <[email protected]>
Author: Jeff Hobbs <[email protected]>
Author: Don Porter <[email protected]>
Author: Larry W. Virden <[email protected]>
Author: Daniel A. Steffen <[email protected]>
Author: Don Porter <[email protected]>
State: Final
Type: Project
Vote: Done
Created: 24-Mar-2004
Post-History:
Tcl-Version: 8.5
~ Abstract
This document describes a new mechanism for the handling of packages
by the Tcl Core which differs from the existing system in important
details and makes different trade-offs with regard to flexibility of
package declarations and to access to the filesystem. This mechanism
is called "Tcl Modules".
~ Background and Motivation
The current mechanism for locating and loading packages employed by
the Tcl core is very flexible, but suffers from a number of drawbacks
as well. These are at least partially the result of the flexibility,
and thus not easily solved without giving up something.
One problem with the current mechanism is that it extensively searches
the filesystem for packages, and that it has to actually read a file
(''pkgIndex.tcl'') to get the full information for a prospective
package. All of these operations take time. The fact that
"index scripts" are able to extend the list of paths searched tends to
heighten this cost as it forces rescans of the filesystem.
Installations where directories in the ''auto_path'' are large or
mounted from remote hosts are hit especially hard by this (network
delays). All of this together causes a slow startup of tclsh and
Tcl-based applications.
"'''Tcl Modules'''" on the other hand is designed with less
flexibility in mind and to allow implementations to glean as much
information as possible without having to perform lots of accesses to
the filesystem.
Additional benefits of the proposed design are a simplified deployment
of packages, akin to the way starkits made application deployment
simple, and from that an easier implementation and management of
repositories.
It does not come without penalties however.
* The simplified design has no "index scripts". While this does away
with extending the list of paths for searching, it also does away with
the ability of packages to check preconditions, like the version of
the currently executing Tcl interpreter. Dependencies of packages
(in module form) on particular versions of Tcl have to be managed
differently and outside of them.
* "Tcl Modules" is defined to be an extension of the existing package
mechanism and does ''not'' replace it. This means that any failure
to find a package as a module ''has to'' cause a fall back to the
regular package mechanism. It also sets a limit on how much of our
goals we can reach: searching for packages which are not installed
will stay relatively slow, and dominated by the filesystem scan of
the regular search. This implies that "Tcl Modules" will be best
suited in installations where the number of regular packages is
low, and contained in a small part of the overall filesystem.
> On the gripping hand, the only regular packages required will be
packages supporting the virtual filesystems employed by modules
(more on that later), so a transformation of a installation based on
a set of regular packages to the form above is quite feasible.
~ Specification
~~ Introduction
Modules are regular Tcl Packages, in a different guise. To ease
explanations, first a summary of the existing mechanism:
* Packages are identified through "''pkgIndex.tcl''" files and the "index
script" they contain. These files are read and define the "provide
script", which tells Tcl how to actually load the package. In
other words, the provide script tells whether to use the "'''source'''" or "'''load'''" command, which
file to specify as an argument to that command, etc. However as
"''pkgIndex.tcl''" contains a regular tcl script, it can do more than
that and actually influence the environment, i.e., the package
search itself, in several ways:
> * It may choose to not register the package if conditions for the
package are not met, like being run by a too old version of Tcl.
> * It may extend the list of paths used to search for packages.
This implies that a package is able to modify the behaviour of
the package search (usually extending the search) even before it is
loaded, and even if it will not be loaded at all.
The above is very flexible, but comes at a price. The filesystem is
not only searched, but files have to be read as well to build up the
in-memory index of packages. And this is iterated if index
files change/extend the list of paths to search.
Tcl Modules simplifies the above considerably, by cutting down on the
number of indirections involved. It only searches for module files
and records their location, but does not read them. The search is
only performed when required, on a limited part of the filesystem.
This makes locating and importing packages in module form easier and
faster. The price is that packages in module form cannot prevent
registration in an interpreter not of their choice, nor can they
influence the package search itself before they are actually used.
The remainder of this document will cover the following topics
* What constitutes a Tcl Module ?
* How are they found ?
* How are they indexed, i.e. entered into the package database ?
~~ Module Definition
A Tcl Module is a Tcl Package contained in a ''single'' file, and no
other files required by it. This file has to be '''source'''able. In other words, a Tcl Module is always imported via:
| source module_file
The "load" command is not directly used. This restriction is not an
actual limitation, as we may believe. Ever since 8.4 the Tcl
'''source''' command reads only until the first ^Z character. This
allows us to combine an arbitrary Tcl script with arbitrary binary
data into one file, where the script processes the attached data in
any it chooses to fully import and activate the package. Please read
[190] "Implementation Choices for Tcl Modules" for more explanations
of the various choices which are possible.
The name of a module file has to match the regular expression
| ([[:alpha:]_][:[:alnum:]_]*)-([[:digit:]].*)\.tm
The first capturing parentheses provides the name of the package, the
second clause its version. In addition to matching the pattern, the
extracted version number must not raise an error when used in the
command
| package vcompare $version 0
This additional check has several benefits. The regular expression pattern is a
bit simpler, and the full version check is based on the official
definition of version numbers used by the Tcl core itself.
~~ Finding Modules
Remember the check for a valid module in last section, and notice that
any filename matching this name pattern is going to be treated by the
TM system as if it's a Tcl module, whether it really is or not. This
means it's a bad idea for any non-Tcl module files that might match
that pattern to end up in a directory where TM will be scanning. This
suggests that the directory tree for storing Tcl modules ought to be
something separate from other parts of the filesystem. This further
implies that a new search path over just these separate storage areas
would be better than Yet Another Use of ''$::auto_path''.
Therefore: Modules are searched for in all directories listed in the
result of the command "'''::tcl::tm::path list'''" (See also section 'API to
"Tcl Modules"'). This is called the "Module path". Neither
"''auto_path''" nor "''tcl_pkgPath''" are used.
All directories on the module path have to obey one restriction:
* For any two directories, neither is an ancestor directory of the
other.
This is required to avoid ambiguities in package naming. If for
example the two directories
| foo/
| foo/cool
were on the path a package named 'cool::ice' could be found via the
names 'cool::ice' or 'ice', the latter potentially obscuring a package
named 'ice', unqualified.
Before the search is started, the name of the requested package is
translated into a partial path, using the following algorithm:
* All occurrences of '::' in the package name are replaced by the
appropriate directory separator character for the platform we are
on. On Unix, for example, this is '/'.
Example:
* The requested package is ''encoding::base64''. The generated
partial path is
| encoding/base64
After this translation the package is looked for in all module paths,
by combining them one-by-one, first to last with the partial path to
form a complete search pattern. The exact pattern and mechanism is
left unspecified, giving the implementation freedom of choice as to what
glob searches to perform, how much of them, and when.
Independent of that, the implemented algorithm has to reject all files
where the filename does not match the regular expression given in the
previous section. For the remaining files "provide scripts" are
generated and added to the '''package ifneeded''' database.
The algorithm has to fall back to the previous unknown handler when
none of the found module files satisfy the request. If the request
was satisfied no fall-back is required.
~~ Provide and Index Scripts
Packages in module form have no control over the "index" and "provide
script"s entered into the package database for them. For a module
file ''MF'' the "index script" is
| package ifneeded PNAME PVERSION [list source MF]
and the "provide script" embedded in the above is
| source MF
Both package name '''PNAME''' and package version '''PVERSION''' are
extracted from the filename '''MF''' according to the definition
below:
| MF = /module_path/PNAME'-PVERSION.tm
Where '''PNAME' '''is the partial path of the module as defined in
section 'Finding Modules' before, and translated into '''PNAME''' by
changing all directory separators to '::', and '''module_path''' is the
path (from the list of paths to search) that we found the module file under.
''Note'' that we are here creating a connection between package names
and paths. Tcl is case-sensitive when it comes to comparing package
names, but there are filesystems which are not, like NTFS. Luckily
these filesystems do store the case of the name, despite not using the
information when comparing.
Given the above we allow the names for packages in Tcl modules to have
mixed-case, but also require that there are no collisions when
comparing names in a case-insensitive manner. In other words, if a
package 'Foo' is deployed in the form of a Tcl Module, packages like
'foo', 'fOo', etc. are not allowed anymore.
Regular packages have no problem with the names of their files, as
their entry point has a standard name ("''pkgIndex.tcl''") and its
contents can be adjusted according to the filesystem they are stored
in.
~~ API to "Tcl Modules"
"Tcl Modules" is implemented in Tcl, as a new handler command for
'''package unknown'''. This command calls the previously installed
handler when its own search fails, thereby ensuring proper fall-back to
the regular package search.
All code and data structures implementing "Tcl Modules" reside in the
namespace "''::tcl::tm''".
A namespace variable holds the list of paths to search for modules,
but is not officially exported. All access to this variable is done
through the following public commands:
* '''::tcl::tm::path add''' ''PATH''
> The path is added at the head to the list of module paths.
> The command enforces the restriction that no path may be an
ancestor directory of any other path on the list. If the new path
violates this restriction an error will be raised.
> If the path is already present as is, no error will be raised and no
action will be taken.
> Paths are searched in the order of their appearance in the list.
As they are added to the front of the list they are searched in
reverse order of addition. In other words, the paths added last
are looked at first.
* '''::tcl::tm::path remove''' ''PATH''
> Removes the path from the list of module paths. The command is
silently ignored if the path is not on the list.
* '''::tcl::tm::path list'''
> Returns a list containing all registered module paths, in the order
that they are searched for modules.
* '''::tcl::tm::roots''' ''PATH_LIST''
> Similar to ''path add'', and layered on top of it. This command
takes a list of paths, extends each with ''tclX/site-tcl'', and
''tclX/X.y'', for major version X of the tcl interpreter and minor
version y less than or equal to the minor version of the
interpreter, and adds the resulting set of paths to the list of
paths to search.
> This command is used internally by the system to set up the
system-specific default paths. See section ''Defaults'' for their
definition, and that their structure matches what this command
does.
> The command has been exposed to allow a buildsystem to define
additional root paths beyond those defined by this document.
We do ''not'' provide APIs for rescanning directories, clearing
internal state and such. The official interface to this functionality
is "package forget" and special interfaces are neither required nor
desirable.
~ Discussion
~~ Restriction to "source"
This has already been discussed in the specification above.
For more discussion I again refer to [190] "Implementation Choices for
Tcl Modules" which explains the various implementation choices in much
more detail.
~~ Preconditions
It has already been mentioned in section 'Background and Motivation'
that preconditions in "index scripts" are lost, one of the penalties
of the simplified scheme specified here.
Their existence was most important to installations with multiple
versions of Tcl coexisting with each other as they could share the
directory hierarchy containing packages between the various Tcl cores.
This is not possible anymore, at least not in a simple manner.
For the majority of installations however, i.e. those without only one
version of Tcl installed, or controlled environments like the inside
of starkits and starpacks, this loss is irrelevant and of no
consequence.
For more discussion please see [191] "Managing Tcl Package and Modules
in a Multi-Version Environment" which explains the various choices a
sysadmin has in much more detail.
~~ Package Metadata
An area possibly made harder by Tcl Modules is the storage and query
of package metadata. [59] was one way of handling such information,
by storing them in the binary library of packages which have such.
Another approach was to store them in the package index script, using
a hypothetical '''package about''' command.
The latter approach has the definite advantage that it was possible to
query the database of metadata for a particular package without
having to actually load said package, as a load may fail if the Tcl
shell used to query the database does not fulfil the preconditions
for that package.
Both approaches listed above assume that it makes sense to query the
database of metadata for all installed packages from a plain Tcl
shell. In other words, to use the standard Tcl shell also as the tool
to directly manage an installation.
It is possible to extend the proposal made in this document to handle
metadata as well. We already reserved the namespace '''::tcl::tm'''
for use by us, so it is no problem to extend the public API with
commands to locate all installed packages, their metadata, and to
perform queries based on this. This will require an additional
specification as to how metadata is stored in/by Tcl Modules, and it will
have to be understood that these extended management operations can
take considerably more time than a '''package require''', as they will
have to scan all defined search paths and all their sub directories
for Tcl Modules, and have to extract the metadata itself as well.
~~ Deployment
The fact that a Tcl Module consists only of a single file makes its
deployment quite easy. We only have to ensure correct placement in
one of the searched directories when installing it locally, but
nothing more.
Regarding the usage of Tcl Modules in a wrapped application, please see
[190] "Implementation Choices for Tcl Modules". This is highly
dependent on the implementation chosen for a specific Tcl Module and
thus not discussed here, but in the referred document.
~~ Package Repositories
At a very basic level, the physical storage, any directory tree
containing properly placed files for a number of modules can serve as
a package repository for the modules in it. In other words, from that
point of view an installation is virtually indistinguishable from a
repository, and their creation and maintenance is very easy
Note however that the higher levels of a repository, like indexing
package metadata in general, or dependence tracking in particular,
licensing, documentation, etc. are not addressed here and by this.
This requires standards for package metadata, format and content,
topics with which this document will not deal.
~~ Defaults
The default list of paths on the module path is computed by a tclsh as
follows, where ''X'' is the major version of the Tcl interpreter and
''y'' is less than or equal to the minor version of the Tcl
interpreter.
* System specific paths
> * '''file normalize''' [['''info library''']]/../tcl''X''/''X''.''y''
> > In other words, the interpreter will look into a directory
specified by its major version and whose minor versions are less
than or equal to the minor version of the interpreter.
> > Example: For Tcl 8.4 the paths searched are
> > * [['''info library''']]/../tcl8/8.4
> > * [['''info library''']]/../tcl8/8.3
> > * [['''info library''']]/../tcl8/8.2
> > * [['''info library''']]/../tcl8/8.1
> > * [['''info library''']]/../tcl8/8.0
> > This definition assumes that a package defined for Tcl ''X.y'' can
also be used by all interpreters which have the same major number
''X'' and a minor number greater than ''y''.
> * '''file normalize''' ''EXEC''/tcl''X''/''X''.''y''
> > Where ''EXEC'' is [['''file normalize''' [['''info
nameofexecutable''']]/../lib]] or [['''file normalize'''
[['''::tcl::pkgconfig get''' libdir,runtime]]]]
> > This sets of paths is handled equivalently to the set coming
before, except that it is anchored in ''EXEC_PREFIX''. For a
build with ''PREFIX'' = ''EXEC_PREFIX'' the two sets are
identical.
* Site specific paths.
> * '''file normalize''' [['''info library''']]/../tcl''X''/site-tcl
* User specific paths.
> * '''$::env'''(TCL''X''.''y''_TM_PATH)
> > A list of paths, separated by either ''':''' (Unix) or ''';'''
(Windows). This is user and site specific as this environment
variable can be set not only by the user's profile, but by system
configuration scripts as well.
> > These paths are seen and therefore shared by all Tcl shells in
the '''$::env'''(PATH) of the user.
> > Note that ''X'' and ''y'' follow the general rules set out above.
In other words, Tcl 8.4, for example, will look at these 5
environment variables
> > * '''$::env'''(TCL8.4_TM_PATH)
> > * '''$::env'''(TCL8.3_TM_PATH)
> > * '''$::env'''(TCL8.2_TM_PATH)
> > * '''$::env'''(TCL8.1_TM_PATH)
> > * '''$::env'''(TCL8.0_TM_PATH)
''All'' the default paths are added to the module path, even those
paths which do not exist. Non-existent paths are filtered out during
actual searches. This enables a user to create one of the paths
searched when needed and all running applications will automatically
pick up any modules placed in them.
The paths are added in the order as they are listed above, and for
lists of paths defined by an environment variable in the order they
are found in the variable.
~~ Installation
The installation of a Tcl module for a particular interpreter is
basically done like this:
| #! /path/to/chosen/tclsh
| # First argument is the name of the module.
| # Second argument is the base filename
| set mpaths [::tcl::tm::path list]
| ... remove all paths the user has no write permissions for.
| ... throw an error if there are no paths left.
| ... provide the user with some UI if more than one path is left
| ... so that she can select the path to use.
| set selmpath [ui_select $mpaths]
| file copy [lindex $argv 1] \
| [file join $selmpath \
| [file dirname [string map {:: /} \
| [lindex $argv 0]]]]
~ Glossary
The following terms and definitions are used throughout the document
* ''index script''
> A script used to index a package, or not. Usually contained in a
file named "''pkgIndex.tcl''". Can check preconditions for a
package and contains package specific code for setting up the
package specific ''provide script''.
* ''provide script''
> This is a package specific script and tells Tcl exactly how to
import it. In the existing package system it is generated and
registered by the ''index script''. Tcl Modules on the other hand
generates it based on information gleaned from filenames.
~ Reference Implementation
A reference implementation is available in Patch 942881
[http://sf.net/tracker/?func=detail&aid=942881&group_id=10894&atid=310894]
~ Questions
~ Comments
[[ Add comments on the document here ]]
A feature asked for during discussion is to allow a directory as
a Tcl Module. I am opposed to this, because behind Tcl Modules is
the same idea/vision as for starkit and starpacks, namely that of
deploying something in the simplest possible manner, without any
overhead. Sometimes I call Tcl Modules package kits, short pakits
(and then twist that then spoken into 'packet' :).
[http://groups.google.ca/groups?hl=en&lr=&ie=UTF-8&frame=right&th=78764d499cc4e4a&seekm=c6tshf030c6%40news4.newsguy.com#link19]
~ Copyright
This document has been placed in the public domain.