Tcl Library Source Code

Artifact [e6b3069bab]
Login

Artifact e6b3069babd499d9eb15aaf8c791e0ef337bc527:

Wiki page [Saving NetNews with Tcl(lib)] by aku 2015-09-10 05:00:37.
D 2015-09-10T05:00:37.367
L Saving\sNetNews\swith\sTcl(lib)
N text/x-markdown
P 8c22026d4d7cd234d6b530cfcc992d6ceed00b42
U aku
W 4923
Originally posted at
[http://core.tcl.tk/akupries/blog/saving-news.html](http://core.tcl.tk/akupries/blog/saving-news.html)

----

Given my various interests I am following several groups like
<news:comp.lang.tcl> and <news:comp.risks> on
[NetNews](http://en.wikipedia.org/wiki/Netnews), a global bulletin
board system which was started shortly after the
[internet](http://en.wikipedia.org/wiki/Internet) itself.

Due to the ephemeral nature of the various boards' contents, with most
servers keeping messages for only a week or two, any access to older
messages means that I either have go to some website which backs them
up, like [Google Groups](http://en.wikipedia.org/wiki/Google_Groups),
or save them on my own.

Here I describe how to do the latter, using
[Tcl](http://core.tcl.tk/tcl) and [Tcllib](http://core.tcl.tk/tcllib).

We will need access to [Tcllib](http://core.tcl.tk/tcllib)'s sources
even if it is already installed from your favorite distribution's
repositories. This is because we will be using the two scripts
```pullnews``` and ```dirstore``` found under
[examples/nntp](core.tcl.tk/tcllib/dir?ci=trunk&name=examples/nntp) to
accomplish our task, and I know of no distribution that installs the
Tcllib examples.

__Edit__: [Stuart Cassoff](http://wiki.tcl.tk/9994) tells me that
OpenBSD does install the examples, since 2008.

Next, we need an account, i.e., a user name and a password, with a
host serving [NetNews](http://en.wikipedia.org/wiki/Netnews) via
[NNTP](http://wiki.tcl.tk/nntp). If your ISP does not provide one then
you have to use one of several specialized providers, like [Eternal
September](http://www.eternal-september.org/).

With that done below are my script

    #!/bin/sh
    #--
    GROUP=comp.lang.tcl
    #--
    BASE=$HOME/Projects/Backups/News
    ACCOUNT=$BASE/etc/eternal-september.org
    SERVER=news.eternal-september.org
    SAVETO=$BASE/archive/$GROUP
    BINDIR=$BASE/bin

    $BINDIR/pullnews -via $ACCOUNT $SERVER $GROUP \
	$BINDIR/dirstore $SAVETO

and its account file:

    the-user-name
    the-user-password
    (additional optional lines ignored by pullnews)

Well, not quite. My actual paths are slightly different, I am not
telling anybody my account information, and the group name is an
argument. Making the equivalent changes is left as an exercise for the
reader.

Some explanations and notes are now likely in order:

  * As written both ```pullnews``` and ```dirstore``` were copied into
    the chosen structure.

    They could also be copied into a path listed on the PATH or the
    PATH could be extended to include the directory they reside in.
    Either way would allow their use in the script without needing an
    absolute path.

  * My host (Shaw, Eternal September as well) requires an account, and
    thus the

        -via $ACCOUNT

    in the script. If the actual host is fully open, without the need
    for any account, then this part of the script has to be removed.

  * I have added an entry to my crontab which runs the script several
    times a day (actually several times per hour). This ensures that
    all new(s) articles of the group incrementally accumulate in the
    backup directory as they arrive.

  * It is, however, a good idea to make an initial manual run of the
    script to pull in the saved backlog from the host as that may take
    a long time (depending on how much it keeps). Eternal September,
    for example, has a backlog spanning several years.

  * Do not forget to create the directory mentioned in SAVETO
    __before__ the first run. The ```dirstore``` script will not
    create it and will bail out with an error if the directory is
    missing.

Now we have a functioning backup, although our storage system is quite
simple - just a directory.

If we want to use a storage system that supports more features, like
an index, searching, etc., we have to look under the hood of
```pullnews``` a bit to see how it talks to the ```dirstore```.

The relevant procedure is ```store_cmd```, which encapsulates the
builtin ```exec```. It is called twice:

 1. ```store_cmd {} last```

    This call queries the store for the sequence number of the last
    stored article, expecting it on stdout. If the result is empty
    ```pullnews``` will use the sequence number of the oldest article
    known to the host instead.

    This is how it pulls the entire backlog on its first run and only
    the new articles on all subsequent runs.

 1. ```store_cmd $data save $lasthandled```

    This saves a retrieved article into the store, with the specified
    sequence number. The article data is presented to the store on
    stdin.

Not very complicated. Any storage command which follows this simple
API can be used as a backend of ```pullnews```.

Happy Tcling.

Z ca35434cd34df290272685a4c1ea7dcf