Author: Torsten Berg <[email protected]>
State: Draft
Type: Project
Vote: Pending
Created: 10-Jun-2024
Tcl-Version: 9.0
Tcl-Branch: documentation-cleanup-for-transition
Keywords: manual pages, nroff, Markdown, documentation
Abstract
This TIP proposes to change the markup language of the manual pages within the Tcl and Tk source trees from nroff to Markdown. This will be done by a conversion script that catches as much of the original formatting as necessary and possible. The conversion is supposed to ease maintenance of the manual pages, get more people involved in helping to have proper documentation of the Tcl language and the Tk toolkit and to enable a modern visual representation of the HTML version of the manual pages. The goal is also to improve the whole public perception and image of Tcl/Tk.
Rationale
The current manual pages are written in nroff. This markup language is no longer among the languages typically used for documenting software. It was designed to be used with terminal output and line printers, both end points not typically used for reading documentation anymore. As not many people actively use nroff these days, it is becoming increasingly difficult to find people willing to write and maintain documentation using nroff.
The actual text of the manual pages is often hidden behind some macros making it hard to just read and comprehend the content. One example is (from re_synax.n):
A word character is an \fIalnum\fR character or an underscore
.PQ \fB_\fR "" .
These special bracket expressions ...
Here, it is not easy to see that this will be rendered as "... or an underscore (_). These special ...".
The markup is complex and Tcl/Tk uses 31 different nroff macros. Some of them are custom macros only for the manual pages (see file man.macros). The documentation of these macros in that file is sometimes quite sparse and leaves room for misuse or misunderstanding. Some macros are not documented at all (".QR"), others macro documentations are wrong (".PQ"). Together with the obscure syntax of the macro definitions, this can easily lead to errorneous usage of the markup. nroff also mixes both semantic markup (e.g. ".SH" and ".SS" for sections and subsections) and directly visual markup (e.g. ".RS"/".RE" pairs for relative indentation).
Further, there is no documentation of how and where exactly the individual macros should be used in the manual pages and what the best practices are (or what to avoid) when writing a manual page.
Therefore, this TIP proposes to take the nroff manual pages and convert them into Markdown. This markup language is meant to be easy-to-read and easy-to-write by humans. There are lots of editors for Markdown supporting the syntax and highlighting text to make the writing and editing visually pleasing. Markdown is very wide-spread, is easy to learn and use (even for non-programmers) and is largely using semantic markup only. It is flexible enough to support all formatting and layout needs that the Tcl/Tk manual pages have.
Markdown is already used and supported in Fossil, the SCM where Tcl and Tk are hosted. This also means that the manual pages can be viewed within Fossil (they are automatically rendered as HTML when viewed in Fossil's brower-based interface) with many but not all Markdown features supported by Fossil. The Tcl TIP system also now uses Markdown and the changelog for Tcl releases has currently been changed from plain text to Markdown. Also, the use of Markdown opens the door to easily convert the manual pages into many different other formats. There are converters from Markdown to a wealth of other formats.
Specification
Subject of this TIP are all manual pages contained in the doc folders of both Tcl and Tk. The current main branches of the source trees will be the starting point. A new branch called "documentation-cleanup-for-transition" has been created off the main branch to prepare the nroff files for conversion. This is necessary because some constructs are not semantic or lead to quite deeply nested structures making it difficult to follow the content. These constructs will be changed before conversion to Markdown.
An example of this is the use of proper Markdown subsections instead of tagged paragraphs (".TP") in nroff (which then also uses bold text spaced vertically to emulate a subsubsection). Also, when typos are found or other obviously incorrect content, these errors will also be corrected. The commits on the branch will be so that errors and structural changes are in separate commits so these can be merged to other branches later easily.
Further, at least one construct, namely having multiple consecutive terms in a definition list (nroff uses ".TP" and ".IP" for this) without term definitions is not possible to represent in Markdown. This will be taken care of by the conversion script, inserting placeholder texts, and not involve a change to the nroff files. Tus, it can still be decided later when another approach should be taken.
The conversion script converts the nroff markup into Pandoc's Markdown. This is probably the most flexible and versatile Markdown variant available and is actively developed. Custom Markdown blocks and inline elements can be defined in Pandoc's Markdown by using fenced divs and bracketed spans. This may be used together with filters to create special output during the conversion to e.g. HTML.
Certain information of the original nroff sources will be stored in Markdown metadata (in a YAML block) instead of the main body of the text. This enables us to flexibly decide which metadat to put into the visual part of e.g. HTML and where to put it. One example is the sometimes very long list of copyright statements. Another example is the manual section ("n" or "3", "Tcl" or "Tk"). This information will be used again when reproducing the manual pages as nroff files.
In order to keep people happy who still want to read the manual pages on terminals, the Markdown manual pages will also be used to produce an nroff version that is close to the original formatting of the manual pages.
The (external) pandoc program will be used initially to produce both the HTML and nroff representations from Markdown. As of now, the plan is to distribute the manual pages in all three formats (Markdown, HTML, nroff) together with the source distribution so people do not need to also install Pandoc.
The individual steps are thus:
- Write conversion script (already done by roughly 95 %)
- Do the conversion and possibly do some manual adjustments afterwards
- Remove the nroff man page files from the Tcl/tk source trees and replace them with the converted Markdown files (also remove tcltk-man2hml.tcl and connected utils)
- Produce HTML from Markdown using Pandoc and include them into the Tcl/Tk source distributions (also into the source trees as well?)
- Produce nroff manual pages from the Markdown files (for people still wanting the usual manual pages) and include them into the Tcl/Tk source distributions (also into the source trees as well?)
Considerations
The alternative format 'doctools' is currently not considered because it does not support links and tables ('doctools' is widely used for Tcllib) -- see also markdown and doctools Markdown is widely used, there are lots of editors for it and the syntax is easy and visually pleasing, sothis is an advantage of markdown vs. doctools to get people write good documentation and keeping the hurdle low Pandoc's markdown is very comprehensive (and extensible via fenced_divs and brackeded_spans) and serves a very good generic format from which other formats can be generated Another alternative is AsciiDoc but the syntax is not so easy to read as markdown and it does not offer anything not possible with Pandoc's markdown
Reference implementation
Work on the conversion script (nroff to Markdown) and the HTML generation is in progress
Next steps
The following next steps are logical follow-up actions. They will be the subject of subsequent TIPs when needed.
After the intial conversion, the plan is to manually unify/enhance the Markdown files both with respect to content and with respect to structure/outline. E.g., there is meta information contained within the nroff manual pages such as Tcl version numbers (the exact meaning of them not being clear) and references to TIPs. This can be used to e.g. (in the Markwon files) provide information on when a certain command, subcommand or command option was introduced into the language (that information can be extracted from the changelog). The information on TIPs can be used to provide links to the actual TIPs for people wanting to reading further details on a specific feature of Tcl/tk. See the Fossil project on chiselapp where some content change proposals are collectd.
After the work on the manual pages, the way is open to continue along this path and make an effort to also update other Tcl/tk resources. The goal is to address all four quadrants of documentation:
- Reference (information-oriented) → manual pages (this TIP)
- Explanation (understanding-oritented) → e.g. add links to TIPs from within the manual pages (this TIP)
- How-to guides (problem-oriented) → ??
- Tutorials (learning-oriented) → Tcl Wiki tutorial
With this in mind, the plan is to
- convert Markdown to nroff and HTML without the need of Pandoc
- update www.tcl.tk (= www.tcl-lang.org) to a similar modern look just like the HTML manual pages
- update and publish an update of the "Tcl/Tk pocket reference" from Paul Reines once published by O'Reilly
Copyright
This document has been placed in the public domain.