Mercurial as a backup tool (on MoinMoin example)

Apart from using Mercurial for version-control I found a few cases, where it is fairly suitable as a ... backup tool.

I am maintaining some MoinMoin wiki. MoinMoin saves its data using the disk files. I won't delve into all details now, but my wiki instance is kept in a directory with subdirs for configuration data, runtime scripts, plugins and - finally - the wiki data - pages written by users, user preferences, and such.

My initial backup procedure was simple - just tar the directory, compress and copy to the remote machine. Slightly troublesome and ... giving only the last version for possible restore. Plus noticeable transfer every time new base backup is made.

Mercurial provided very nice alternative:

I just created Mercurial repository inside my wiki directory, added whole wiki data to it, commited, and configured cron job to regularly addremove all files, and commit. Cron job running on the backup machine pulls from this repository, to grab updates.

Not only I am keeping up to date, trivially restorable remote backup without hurting my traffic quota (only new changes are copied and they are tiny comparing to the whole content), but also:

I can revert to the state at any date (since I started to apply this procedure),
I can easily see what has changed without using the wiki interface,
I feel safe with respect to MoinMoin upgrades - I can branch to test the upgrade in separate location, and switch users to the migrated version once I am happy with it (and I can always revert if I need to).

All that at minimal disk cost, .hg subdirectory adds only about 80% to the previous dir size (on the backup machine it just takes this 80% as I do not need to checkout actual files).

Sweet, isn't it? And this technique works great in many situations (including versioning and backing up /etc). Below some technical details for the MoinMoin case.

Initial import

Install Mercurial (from package, or by easy_install Mercurial) - if there are some symlinks in the directory, 0.9.4 or newer is recommended (on both - the production and the backup machine).

Go to the wiki data directory and execute

master$ hg init

It will create .hg subdir there. In (unlikely) case wiki data dir parts are accessible via web (served via apache, for example), make sure this dir is not.

In case you would like to check options and meaning of some command, use `hg help`, for example hg help init

Edit the .hgignore file, putting there the files which do not need to be backed up. Something like:

syntax:regexp

^(data|underlay)/pages/[^/]+/cache/
^data/(event-log|expaterror.log|error.log)$
\.py[co]$

Of course tune it according to your directory structure. Above I omit cache directories, less useful logs, and compiled python files (for plugins code). Version controlling them does not harm, but they are not needed and sometimes huge (logs). If you hate regular expressions, use syntax: glob and shell-like wildcards (data/pages/*/cache and so on). Test this file by hg status (files to be ignored should not be reported).

Now let's add files to the repository:

master$ hg add
master$ hg commit -m "Initial import"

Some status checking commands for curious:

master$ hg status
master$ hg log
master$ du -sm .hg

Make some wiki edit and try hg status to see the modifications.

Initial copy (clone)

OK, let's start copying the data to the remote machine. We need some ssh access, either from the production machine to the backup, or from the backup to the production. Whichever direction you pick, configure authorized_keys so something like ssh another.machine ls works without the password prompt.

Now let's make the initial clone. If we can ssh from the backup to the master, it will be:

backup$ hg clone --noupdate ssh://[email protected]//var/lib/wikidir

(note two slashes after machine name, this means absolute path, single slash would treat the rest as path relative to the wiki user home directory). It will create the directory named wikidirin the current directory (it can be renamed or moved if needed). Let's also immediately create the file .hg/hgrc containing something like:

[paths]
master = ssh://[email protected]//var/lib/wikidir

This is just an alias, thanks to it we will be later able to use hg pull master instead of hg pull ssh://[email protected]//var/lib/wikidir.

If we can ssh from the master to the backup, we proceed similarly:

master$ hg clone /var/lib/wikidir ssh://[email protected]//backup/wiki

Patch it to your needs, this example creates the new directory named /backup/wiki. As above, note two slashes, if you replace them with one, $HOME/backup/wiki will be created. And again, make an alias in .hg/hgrc:

[paths]
backup = ssh://[email protected]//backup/wiki

Whichever command you used, /backup/wiki should now contain only .hg directory. This is probably preferable for backup, but would you want to unpack the wiki files, just

backup$ cd /backup/wiki
backup$ hg update

(you can safely remove them later with brutal rm -rf *, just leave .hg). Whether they are unpacked, or not, you can try commands like

backup$ cd /backup/wiki
backup$ hg log
backup$ hg status

Commiting updates

So far we just imported and copied single version. Let's now configure our system to commit the changes. This is just a matter of writing the following shell script:

#!/bin/bash
cd /var/lib/wikidir
hg addremove
hg commit -m "Automatical backup"

and configuring it to be run at regular intervals (I do it every early morning) - from cron. Remember to configure it so it runs from the files (and repository) owner account. People preferences vary (anacron, cron.d, ....), here is the simple and safe solution:

$ sudo -u wikiuser crontab -e

and write:

4  5  *  *  *   /path/to/wikibackupscript

The Mercurial commands used above have the following meaning: addremove marks newly added files to be added by the next commit, and marks deleted files (on disk) for removal. commit performs the actual commit to the repository.

Removal is of course version-controlled, when you revert to the older version, the removed file reappears.

Copying updates

The final step is to ensure updates are copied between machines.

If we are ssh-ing from the backup to the master, we just need to run

hg --cwd /backup/wiki  pull master

at regular intervals.

Obvious solution is to run it from cron (on the backup machine, using account which owns /backup/wiki and is able to ssh to the master). One would usually configure it to run an hour or two after the commiting script runs.

The pull command grabs all new changesets from the remote repository (master) and saves them to the local .hg. Use pull --update if you want to update checked out copy to the newest version.

If we are ssh-ing from master to backup, the simplest method is to extend the already written wikibackupscript (the one which addremoves and commits) with

hg push --force backup

Alternatively, spawn from cron

hg --cwd /var/lib/wikidir push --force backup

The --force option here is used so the push is allowed to work if you happen to make some commit in the backup directory. This will create branches in the history. `pull` does the same without asking.

Final words

Nothing forces you to stick with just two repositories. Would you like to, you can create more clones (cloning either from production, or from backup, as you like, pulling automatically or manually) - for example, to test MoinMoin upgrade procedure, or to run development copy for plugin testing.

Having repository version-controlled makes one feel far more comfortable while reconfiguring, rearranging, or upgrading - would anything go wrong, it is always possible to go back.

The same procedure is probably implementable with Git, Bazaar, or Darcs. I recommend Mercurial as it's pull/push commands are not attempting any merges, are not updating the destination dir (unless asked to), and just are not doing anything except copying the new changesets - so they are safe to be used from cron.

Mercurial as a backup tool (on MoinMoin example)

Initial import

Initial copy (clone)

Commiting updates

Copying updates

Final words

Subscribe

Quote

Recent Entries

I like and recommend

Recent Comments

Play Chess