Mirror: Mirror Management

Intro

MM is a tool to ease backing up a large number of repositories.

It currently supports fossil, git(hub), mercurial (hg), and subversion (svn) repositories.

Its operation is made easier by these DVCS all having replication and synchronization protocols baked into them and their clients. Because of that MM did not have to invent anything new, it gets by just by invoking the existing tools (fossil, hg, git, git hub, svn).

All management is done through a command line application with integrated help, called mirror.

Beyond management MM is also capable of exposing the pool of backups to the web. This is done by mirror generating a static website which can then be served by a web engine of the user's choice. This functionality requires TclSSG to be installed, expecting its main application to be accessible under the name ssg.

It should be noted MM does not automatically perform repository updates on its own. It expects to be invoked by some external scheduler, for example cron, for this.

Examples

Currently the only example of MM in use can be found at https://akupries.tclers.tk/r/tcl. This location collects and mirrors as many Tcl-related repositories as it can find.

Basics of operation

Adding repositories

The basic command to add a single repository to the system is

mirror add <url>

MM will try to figure out the type of repository to mirror from the specified url, and further derive a name for the mirror set to hold the repository as well.

If it guesses wrong the options --vcs and --name can be used to explicitly specify the correct values.

mirror add <url> --name <name> --vcs <vcs>

The set of version control systems supported by the installed mirror can be queried with

mirror vcs

Going back to add and its auto-detection of vcs type, the currently employed heuristics (.i.e url patterns), are, in order:

Pattern	VCS	Notes
`github`	github	Requires `git hub` & `git`
`git`	git	Requires `git`
`hg.code.sf.net`	hg	Requires `hg`
`hg.code.sourceforge.net`	hg	S.a.
`svn.code.sf.net`	svn	Requires `svn`
`svn.code.sourceforge.net`	svn	S.a.
`*`	fossil	Requires `fossil`

As can be seen, the order does matter, and fossil is the catch-all fallback.

The two main concepts here are the repository, identified by its url, and the mirror set, identified by its name.

While mirror add always places the specified repository into its own mirror set the latter can contain more than one repository, while each repository always belongs to only one mirror set.

Mirror sets are there to group related repositories together. The command to coalesce mirror sets into one after adding repositories is mirror merge.

The action comes at a price, and with restrictions. All repositories in a mirror set for the same type VCS will share the local backing store.

For fossil repositories MM can and does use the asociated projectcode to detect attempts at merging unrelated repositories, and rejects such. For git(hub) no such information exists, and the only warning will be the message no common ancestors found when updating such a mirror set. For mercurial the situation is similar.

On the positive side placing related repositories together reduces the amount of disk space required.

Quick access to content

An important structure maintained by MM is the rolodex.

It is a stack which is updated whenever repositories are added or removed, and mirror sets merged and split. This makes it easy to quickly reference repositories which were recently worked on.

The last and previously used repositories are accessible through the @c and @p short hands. The repositories further down the history are accessible via @num.

The new rolodex's contents are always shown after a command changing it completes, and can be explicitly queried with mirror current.

Search operations like mirror list <substring> write their results to the rolodex as well.

Note, the rolodex is of limited size. The initial default is 20 entries. This configuration can be queried and changed with the mirror limit command.

The same limit L also applies to the output of the mirror list command when not used to search for content. In that case it shows only L entries per invokation, and a series of invokations cycles through the entire list of repositories.

Updating the mirror

The command to update the mirror is mirror update (sic!).

To prevent overloading both the local machine and the remote locations each invokation of this command will only update a subset of the known mirror sets. To this end MM manages an internal queue new mirror sets are added to, and mirror sets to update are taken from from the front. When the queue runs empty it is simply refilled again with all the mirror sets known at that time.

The current state of the queue is accessible via mirror pending, with the mirror sets to be taken by the next invokation of mirror update at the top and marked.

The default is to update 5 mirror sets per invokation. This configuration can be queried and changed with the mirror take command.

Together with being driven by the liks of cron this keeps the local load low, and distributes the remote load over a larger time interval as well, with cron interval and number of sets taken per cycle the main knobs to regulate this.

Bulk operations

While add and merge are the only operations needed to add new repositories, and manage their mirror sets, using them still will be tedious when having to add a large batch.

To simply this case we have the command mirror import. It takes a simple text specifying repositories and their mirror sets in simple markup and imports them all in one batch, performing all the necessary add and merge operations.

The file format is line-oriented, with each non-comment line specifying either a repository, or a mirror set. Comments start with a hash-character (#, U+0023) and run to the end of their line. Empty lines are ignored.

The simplest possible import file looks like

R <vcs> <url>
M <name>

and is equivalent to

mirror add <url> --name <name> --vcs <vcs>

To place more repositories into the mirror set simply place more repository specification before it, like

# A comment
R <vcs1> <url1>
R <vcs2> <url2>
...
M <name>

Any number of repositories and mirror sets can be specified.

On the converse side of the above is mirror export, which writes the current state of repositories and mirror sets to stdout, in a format directly usable to mirror import.

404 - No contact at this number

Given that MM is for the backup of remote repositories to protect against their loss, it is only right to handle the possibility of remote locations vanishing.

Such a sitation will actually not disturb the operation of mirror update, and if the loss is temporary the situation will resolve itself when update comes back to the repository in question, and simply pull more data from the other side.

However when the situation appears to be permanent then the manager might not wish to spend cycles and bandwidth on querying a repository which is gone. Yet the local backup should not be deleted either.

Thus we have mirror disable and mirror enable with which we can take a repository out of the update rotation, or put it back in. A repository in the rotation is called active, and inactive otherwise.

Web site

... TODO ...