Tcl Improvement Proposals: 40.tip at [074c62de54]

File tip/40.tip artifact 5171d0c5cc part of check-in 074c62de54

TIP:            40
Title:          Documentation Generator for Tcl Scripts
Version:        $Revision: 1.4 $
Author:         Arjen Markus <[email protected]>
Author:         Donal K. Fellows <[email protected]>
State:          Withdrawn
Type:           Project
Vote:           Pending
Created:        04-Jul-2001
Post-History:   
Keywords:       documentation,automatic generation,HTML,reference
Tcl-Version:    8.0

~ Abstract

This TIP proposes the adoption of a standard documentation format for
Tcl scripts and the implementation of a simple tool that will extract
this documentation from the source code so that it may be turned into
a programmer's guide.  This is in essence akin to documentation tools
like the well-known ''javadoc'' utility for Java programs and Eiffel's
short utility.

~ Introduction

The style guide by Ray Johnson presents a documentation standard that is
easy to use and is in fact adopted in the Tcl/Tk distribution.
Other than this, the standard has not been enforced or encouraged. It is
also not backed up by some tool (as far as I know) that can generate
pretty looking documents from this.  As a consequence, styles of
documentation may vary widely and at times it is necessary to read the
source code (looking for descriptions) to understand how to use the
script.

The availability of such a tool may encourage people to use the standard,
as the costs of generating the documentation are relatively low.
The tool must accommodate for variations and therefore be flexible -
for instance by providing customisable procedures to support the user's
preferred header format.

The tool also needs to distinguish the types of output: in many cases HTML
output is desirable to make it look pretty and provide hypertext facilities,
in other cases it should provide plain text, formatted so that it can be
read in any ordinary text editor or printed quickly.

Parallel to the development of such a tool, a standard or checklist should be
assembled of what information programmer ought to provide, the version of
Tcl/Tk, extensions that need to be present, what functionality is offered
and so on.

~ Rationale

Automatic documentation generation has two goals: improving usability and
improving maintainability. The first means: pleasant looking documentation,
at low cost for the author, is easy to use. One can also avoid reading the
source code. Further, it ensures homegeneously looking documentation.

Improving the maintainability is achieved by having more or less
technical documentation near the code. There is no need for separate
documents, something which enhances the risk of discrepancies. Remember
the DRY principle: Don't Repeat Yourself.

~ What information

A user clearly needs different information than a maintainer. For the
user it is important to know what functionality is provided, what other
packages or extensions are needed, which (public) procedures are available
and how to use them.

For the maintainer: having an overview of the source files helps finding
the procedures. Part of this information can be extracted directly from
the source (such as via inspection of the proc, package and
namespace statements).

~ Formats

Use the format proposed by the style guide as a guideline (certainly
for the reference implementation):

| # pkg_compareExtension --
| #
| #  Used internally by pkg_mkIndex to compare the extension of a file to
| #  a given extension. On Windows, it uses a case-insensitive comparison
| #  because the file system can be file insensitive.
| #
| # Arguments:
| #  fileName     name of a file whose extension is compared
| #  ext          (optional) The extension to compare against; you must
| #               provide the starting dot.
| #               Defaults to [info sharedlibextension]
| #
| # Results:
| #  Returns 1 if the extension matches, 0 otherwise

(This comes from the "package.tcl" script file that came with Tcl 8.3.1,
it is consistent with the Tcl style guide by Ray Johnson)

~ Requirements

The requirements are simple to describe:

 * Implementation shall be in Tcl, to guarantee availability and
   portability.

 * The system shall be easy to extend to new documentation formats
   (Implementation note:  small procedures that register the
   information piece by piece)

 * It shall be easy to extend to new output formats
   (Implementation note: small procedures that format the registered
   information)

 * It shall properly deal with organisational aspects:  source file,
   package, namespace.

~ Summary of reactions

The replies on the first version of this TIP were quite positive: both
Donal Porter and Cameron Laird think it is a good idea. Juan Gil gave
a very extensive reaction, describing a more general framework that
would eventually result in a system for generating all kinds of output
from Tcl scripts, TIPs and so on.

To do him more justice, without repeating the entire document, he
proposes the use of XML as an intermediate format holding the structure
of the information. The advantage is the possibility to reuse all
existing tools and (de facto) standards, notably DocBook, in this
context.

Even though I share some of the enthousiasm of Juan, I am a bit awed
by it: the original idea of this TIP is not so much creating a
publication system, but rather an easy-to-use tool for
automatically extracting useful information in a nice shape.
Eventually it could develop into something of the kind Juan describes,
but that should not be the first goal.

The technique for representing the information structure he proposes, is
quite useable (and akin to the rendering process of TIPs):

 * Parse the script and store the pieces in "qualified lists".

 * Such qualified lists are an intermediate format, either in memory or
   stored in a suitable format on disk.

 * These lists and lists of lists are then passed to the output
   renderers.

The first problem to solve is then finding a suitable structure for the
information we need to extract. This is the subject of the next section.

Will Duquette and Andreas Kupries mentioned the frequent use of
specialised commands that introduce new commands (rather than a
straightforward call to "proc"). This feature will have to be looked
into, because if you only look for lines like
"proc some-command { ... } {", you might well miss the essentials of
such applications.

~ Information in a Tcl script

Tcl scripts are organised in three essentially unrelated ways:

 * Individual files contain the code

 * Programs consist of one or several files of code

 * Packages are used to identify code that belongs together

In practice these methods will be used in accord with each other, but
there is no guarantee for instance, that a source file contains only one
package and programs will probably quite often use more than one
package.

On a smaller scale, the following items are of importance:

 * Procedures or commands defined inside some namespace (exported or
   not)

 * Variables local to the enclosing namespace or global to the
   application

For a user it will be important to know what a program or package has to
offer and how to get this functionality:

 * A description of the program (with its command-line arguments,
   if any, the packages it uses, if they come separately)

 * A description of the package or packages (assuming it is properly
   installed, its name and version should be enough to load the
   scripts and binaries)

 * A description of each public procedure and a description of the
   arguments and the result, if any

 * A list of all other requirements (other packages or the Tcl version?)

For a maintainer of the code, additional information would include:

 * A list of all variables (local to the namespace and global) that are
   used in the various procedures

 * The contents of each source file (so where does each procedure live?)

 * Detailed maintenance issues that have been written down in comments
   in the code

A possible structure in which all this information can be stored and
retrieved is sketched below:

| program:
|    version
|    source files (list of)
|    packages (list of)
|    procedures (list of)
|    command-line arguments:
|       description of each option and the values (if any)
|       associated with it
|       description of other arguments (such as file names)
| source files:
|    packages (list of)
|    procedures (list of)
| packages:
|    package A:
|       description
|       version
|       requirements
|          Tcl-version
|          packages that are required
|       local and global variables:
|          variable D:
|             used by which procedures?
|          variable E:
|             used by which procedures?
|          ...
|       procedures:
|          procedure F:
|             description
|             exported or not
|             arguments:
|                argument G:
|                   description
|                argument H:
|                   description
|                ...
|             result:
|                description
|          procedure I:
|             ...

Except for the descriptions, all these items can - at least in principle
be extracted automatically. So, even though the programmer has been too
lazy to describe his/her procedures in detail, some information can be
retrieved about the use of global data for instance and the complexity
of the argument lists.

Note that as far as the three methods of organisation is concerned,
there is no attempt to define a practical relationship between them.

To refer to the example above:

| procedure: pkg_compareExtension
|    description:
|       "Used internally by pkg_mkIndex to compare the extension of a file to
|       a given extension. On Windows, it uses a case-insensitive comparison
|       because the file system can be file insensitive."
|    arguments:
|       argument fileName:
|          description:
|             "name of a file whose extension is compared"
|       argument ext:
|          description:
|             "(optional) The extension to compare against; you must
|             provide the starting dot.
|             Defaults to [info sharedlibextension]"
|    result:
|       description:
|          "Returns 1 if the extension matches, 0 otherwise"

By adopting a tree structure to represent the information extracted from
the source code, one can be as flexible as probably needed. For instance,
suppose one would like to extract certain metrics, like the number of
lines or the cyclometric complexity. This could then be an additional
node in the subtree for the command procedure, besides the list of arguments,
the result and the description.

~ Copyright

This document is placed in the public domain.