TIP 700: Use Markdown instead of nroff for the Tcl/Tk manual pages

Login
Author:		Torsten Berg <[email protected]>
State:		Draft
Type:		Project
Vote:		Pending
Created:	10-Jun-2024
Tcl-Version:	9.0
Tcl-Branch:	documentation-cleanup-for-transition
Keywords:	manual pages, nroff, Markdown, documentation

Abstract

This TIP proposes to change the markup language of the manual pages within the Tcl and Tk source trees from nroff to Markdown. This will be done by a conversion script that catches as much of the original formatting as necessary and possible. The conversion is supposed to ease maintenance of the manual pages, get more people involved in helping to have proper documentation of the Tcl language and the Tk toolkit and to enable a modern visual representation of the HTML version of the manual pages. The goal is also to improve the whole public perception and image of Tcl/Tk.

Rationale

The current manual pages are written in nroff. This markup language is no longer among the languages typically used for documenting software. It was designed to be used with terminal output and line printers, both end points not typically used for reading documentation anymore. As not many people actively use nroff these days, it is becoming increasingly difficult to find people willing to write and maintain documentation using nroff.

The actual text of the manual pages is often hidden behind some macros making it hard to just read and comprehend the content. One example is (from re_synax.n):

A word character is an \fIalnum\fR character or an underscore
.PQ \fB_\fR "" .
These special bracket expressions ...

Here, it is not easy to see that this will be rendered as "... or an underscore (_). These special ...".

The markup is complex and Tcl/Tk uses 31 different nroff macros. Some of them are custom macros only for the manual pages (see file man.macros). The documentation of these macros in that file is sometimes quite sparse and leaves room for misuse or misunderstanding. Some macros are not documented at all (".QR"), others macro documentations are wrong (".PQ"). Together with the obscure syntax of the macro definitions, this can easily lead to errorneous usage of the markup. nroff also mixes both semantic markup (e.g. ".SH" and ".SS" for sections and subsections) and directly visual markup (e.g. ".RS"/".RE" pairs for relative indentation).

A limitation of nroff is the avialability of only two section levels (".SH" and ".SS"). Every time a third level is needed, the manual pages typically emulate this using a tagged paragraph (".TP"). This is visually not distinguishalbe from any other tagged paragraph and is used extensively for command options. Any further subdivision needs to be emulated with e.g. relative insets and bold font (i.e. no semantic markup).

Further, there is no documentation of how and where exactly the individual macros should be used in the manual pages and what the best practices are (or what to avoid) when writing a manual page.

Therefore, this TIP proposes to take the nroff manual pages and convert them into Markdown. This markup language is meant to be easy-to-read and easy-to-write by humans. There are lots of editors for Markdown supporting the syntax and highlighting text to make the writing and editing visually pleasing. Markdown is very wide-spread, is easy to learn and use (even for non-programmers) and is largely using semantic markup only. It is flexible enough to support all formatting and layout needs that the Tcl/Tk manual pages have.

Markdown is already used and supported in Fossil, the SCM where Tcl and Tk are hosted. This also means that the manual pages can be viewed within Fossil (they are automatically rendered as HTML when viewed in Fossil's brower-based interface) with many but not all Markdown features supported by Fossil. The Tcl TIP system also now uses Markdown and the changelog for Tcl releases has currently been changed from plain text to Markdown. Also, the use of Markdown opens the door to easily convert the manual pages into many different other formats. There are converters from Markdown to a wealth of other formats.

Specification

Subject of this TIP are all manual pages contained in the doc folders of both Tcl and Tk. The current main branches of the source trees will be the starting point. A new branch called "documentation-cleanup-for-transition" has been created off the main branch to prepare the nroff files for conversion. This is necessary because some constructs are not semantic or lead to quite deeply nested structures making it difficult to follow the content. These constructs will be changed before conversion to Markdown.

An example of this is the use of proper Markdown subsections instead of tagged paragraphs (".TP") in nroff (which then also uses bold text spaced vertically to emulate a subsubsection). Also, when typos are found or other obviously incorrect content, these errors will also be corrected. The commits on the branch will be so that errors and structural changes are in separate commits so these can be merged to other branches later easily.

Further, at least one construct, namely having multiple consecutive terms in a definition list (nroff uses ".TP" and ".IP" for this) without term definitions is not possible to represent in Markdown. This will be taken care of by the conversion script, inserting placeholder texts, and not involve a change to the nroff files. Thus, it can still be decided later when another approach should be taken for this issue.

The conversion script converts the nroff markup into Pandoc's Markdown. This is probably the most flexible and versatile Markdown variant available and is actively developed. Custom Markdown blocks and inline elements can be defined in Pandoc's Markdown by using fenced divs and bracketed spans. This may be used together with filters to create special output during the conversion to e.g. HTML.

Certain information of the original nroff sources will be stored in Markdown metadata (in a YAML block) instead of the main body of the text. This enables us to flexibly decide which metadata to put into the visual part of e.g. HTML and where to put it. One example is the sometimes very long list of copyright statements. Another example is the manual section ("n" or "3", "Tcl" or "Tk"). This information will be used again when reproducing the manual pages as nroff files.

In order to keep people happy who still want to read the manual pages on terminals, the Markdown manual pages will also be used to produce an nroff version that is close to the original formatting of the manual pages.

The (external) pandoc program will be used initially to produce both the HTML and nroff representations from Markdown. As of now, the plan is to distribute the manual pages in all three formats (Markdown, HTML, nroff) together with the source distribution so people do not need to also install Pandoc.

The individual steps are thus:

  1. Write conversion script (already done by roughly 95 %)
  2. Do the conversion and possibly do some manual adjustments afterwards
  3. Remove the nroff man page files from the Tcl/tk source trees and replace them with the converted Markdown files (also remove tcltk-man2hml.tcl, connected utility scripts and the corresponding build target)
  4. Produce HTML from Markdown using Pandoc and include the HTML files into the Tcl/Tk source distributions (for convenience, so user do not need to install Pandoc as external tool just in order to prodce HTML versions of the manual pages)
  5. Produce nroff manual pages from the Markdown files (for people still wanting the usual manual pages) and include them into the Tcl/Tk source distributions (for convenience also)

The conversion script and process

Apart from directly converting the manual pages from nroff markup to Markdown, the conversion script will try to also detect certain constructs for enhancing the text. As an example, there are no links in the text of the nroff version while the rendered HTML (via the tcltk-man2hml.tcl script) has links. The conversion script will try to detect places that are good candidates for links and place them into the Markdown version directly. Another example is the conversion of section headers and subsection headers from upper case to title case as the formatting of the title should not be hard-coded into the Markdown text but left to the rendering process to decide. Also, it is considered to change the way subcommands and options are represented. Currently they are presented as 'definition lists' (nroff .TP or .IP macros) or using custom macros. It will be beneficial to represent these as individual subsections so every subcommand/option has its own place in the outline of the manual and can easily be added to a table of contents later.

Semantic markup

In order to use semantic markup as much as possible and thus gain flexibility in terms of the visual rendering and also for gathering proper indexing information from the manual pages, the synopsis and the other command syntax lines will use a special Markdown construct provided by Pandoc's Markdown and called 'bracketed_spans'. A few examples:

# nroff markup:
\fBarray \fIoption arrayName\fR ?\fIarg arg ...\fR?
\fBarray get\fI arrayName \fR?\fIpattern\fR?
\fBreturn \fR?\fB\-code \fIcode\fR? ?\fIresult\fR?
\fBputs \fR?\fB\-nonewline\fR? ?\fIchannel\fR? \fIstring\fR
\fIpathName \fBaddtag \fItag searchSpec \fR?\fIarg ...\fR?

# simple Markdown:
**array** *option arrayName* ?*arg arg ...*?
**array get** *arrayName* ?*pattern*?
**return** ?**-code** *code*? ?*result*?
**puts** ?**-nonewline**? ?*channel*? *string*
*pathName* **addtag** *tag searchSpec* ?*arg ...*?

# semantic Markdown using 'bracketed_spans':
[array]{.cmd} [option]{.arg} [arrayName]{.arg} [arg arg]{.optdot}
[array]{.cmd} [get]{.sub} [arrayName]{.arg} [pattern]{.optarg}
[return]{cmd} [[-code]{.lit} [code]{.arg}]{.optarg} [result]{.optarg}
[puts]{.cmd} [-nonewline]{.optlit} [channel]{.optarg} [string]{.arg}
[pathName]{.ins} [addtag]{.sub} [tag]{.arg} [searchSpec]{.arg} [arg]{.optdot}

So, the different elements of a command call are supplied with Markdown class attributes (starting with a dot). All words are normally defined/written separately (as opposed to the simple markdown syntax), the only exception being a mixture of word belonging to an optional group (such as the [[-code]{.lit} [code]{.arg}]{.optarg} in the return command):

The class names indicate what the respective text is representing within the syntax. The {.opt} and {.dotopt} part may contain other syntax elements. This information can subsequently be used to e.g. specify CSS for rendering of these syntax specifications. Literal words are typically written using a bold style, other parts of the syntax are typically using an italic style. (The whole line could optionally be nested in another bracket_span to mark the line as a command syntax line or it can be put into a fenced_div. However, this will decrease readability further.)

A similar markup will be used for Tk and ttk widget option specifications and for the C API specifications in the manual section 3 pages.

Code blocks

Code blocks in manual pages are inconsistent in terms of how they are written. Both command lines and result lines are written in different ways. Commands are often on lines starting with "%" to explicitly mark them a command line and distinguish them from the resulting output. Sometimes the "%" is there, sometimes not. The "%" sign should be put in front of all command lines. This is how Tcl looks in a terminal anyway and also makes it possible to display the command with a different visual appearance (colour, font etc.) when rendered as HTML.

The results of a command execution are also inconsistently worded in the manual pages' code blocks. Sometimes they appear below the code block as plain text, sometimes they appear in Tcl comments within the code block (append and lassign) and sometimes within the code block and marked with an "->" (lset or lsearch) or just verbatim (lrange). Sometimes, results even occur in code blocks of their own (binary). This will be unfied to clearly distinguish commands from command results and in a way enabling different rendering in HTML.

To add more value to the code blocks, they will be supplemented with buttons when rendered as HTML. One button will allow the code to be copied (just as in the Tcl wiki) and another button will take the code from the block and open it in a "Tcl playground" so it can be tested live there.

Considerations

The alternative format 'doctools' is not considered because it does not support links and tables ('doctools' is widely used for Tcllib) and the syntax is more verbose and thus less 'readable'. Also, 'AsciiDoc' is not considered as it is not as widespread as Markdown, its syntax is not so easy to read as Markdown and it does not offer anything that is not possible with Pandoc's Markdown already (with regard to the features needed for the manual pages).

Markdown is widely used, there are lots of editors for it and the syntax is easy and visually pleasing. This is an advantage of Markdown vs. doctools to get people write good documentation and keeping the hurdle low. Pandoc's Markdown is very comprehensive (and extensible via fenced_divs and brackeded_spans and filters) and serves as a very good generic format from which other formats can be generated.

Reference implementation

Work on the conversion script (nroff to Markdown) and the HTML generation is in progress. The implementation is currently hosted here on chiselapp.

Next steps

The following next steps are logical follow-up actions. They will be the subject of subsequent TIPs when needed.

After the intial conversion, the plan is to manually unify/enhance the Markdown files both with respect to content and with respect to structure/outline. E.g., there is meta information contained within the nroff manual pages such as Tcl version numbers (the exact meaning of them not being clear) and references to TIPs. This can be used to e.g. (in the Markdown files) provide information on when a certain command, subcommand or command option was introduced into the language (that information can be extracted from the changelog). The information on TIPs can be used to provide links to the actual TIPs for people wanting to reading further details on a specific feature of Tcl/tk. See the Fossil project on chiselapp where some content change proposals are collected.

After the work on the manual pages, the way is open to continue along this path and make an effort to also update other Tcl/Tk resources. The goal is to address all four quadrants of documentation:

  1. Reference (information-oriented) → manual pages (this TIP)
  2. Explanation (understanding-oriented) → e.g. add links to TIPs from within the manual pages, make sure every page has at least one piece of example code (this TIP)
  3. How-to guides (problem-oriented) → to be written?
  4. Tutorials (learning-oriented) → Tcl Wiki tutorial

With this in mind (and wanting to keep the toolchain to Tcl tools shipped with the core), the plan is to

Copyright

This document has been placed in the public domain.