Apart from using Mercurial for version-control I found a few cases, where it is fairly suitable as a ... backup tool.
I am maintaining some MoinMoin wiki. MoinMoin saves its data using the disk files. I won't delve into all details now, but my wiki instance is kept in a directory with subdirs for configuration data, runtime scripts, plugins and - finally - the wiki data - pages written by users, user preferences, and such.
My initial backup procedure was simple - just tar
the directory, compress and copy to the remote machine. Slightly troublesome and ... giving only the last version for possible restore. Plus noticeable transfer every time new base backup is made.
Mercurial provided very nice alternative:
I just created Mercurial repository inside my wiki directory, added whole wiki data to it, commited, and configured cron job to regularly addremove
all files, and commit. Cron job running on the backup machine pulls from this repository, to grab updates.
Not only I am keeping up to date, trivially restorable remote backup without hurting my traffic quota (only new changes are copied and they are tiny comparing to the whole content), but also:
- I can revert to the state at any date (since I started to apply this procedure),
- I can easily see what has changed without using the wiki interface,
- I feel safe with respect to MoinMoin upgrades - I can branch to test the upgrade in separate location, and switch users to the migrated version once I am happy with it (and I can always revert if I need to).
All that at minimal disk cost, .hg
subdirectory adds only about 80% to the previous dir size (on the
backup machine it just takes this 80% as I do not need to checkout actual files).
Sweet, isn't it? And this technique works great in many situations (including versioning and backing up /etc
). Below some technical details for the MoinMoin case.
Initial import
Install Mercurial (from package, or by easy_install Mercurial
) - if there are some symlinks in the directory, 0.9.4 or newer is recommended (on both - the production and the backup machine).
Go to the wiki data directory and execute
master$ hg init
It will create .hg
subdir there. In (unlikely) case wiki data dir parts are accessible via web (served via apache, for example), make sure this dir is not.
Edit the .hgignore
file, putting there the files which do not need to be backed up. Something like:
syntax:regexp
^(data|underlay)/pages/[^/]+/cache/
^data/(event-log|expaterror.log|error.log)$
\.py[co]$
Of course tune it according to your directory structure. Above I omit cache directories, less useful logs, and compiled python files (for plugins code). Version controlling them does not harm, but they are not needed and sometimes huge (logs). If you hate regular expressions, use syntax: glob
and shell-like wildcards (data/pages/*/cache
and so on). Test this file by hg status
(files to be ignored should not be reported).
Now let's add files to the repository:
master$ hg add
master$ hg commit -m "Initial import"
Some status checking commands for curious:
master$ hg status
master$ hg log
master$ du -sm .hg
Make some wiki edit and try hg status
to see the modifications.
Initial copy (clone)
OK, let's start copying the data to the remote machine. We need some ssh access, either from the production machine to the backup, or from the backup to the production. Whichever direction you pick, configure authorized_keys
so something like ssh another.machine ls
works without the password prompt.
Now let's make the initial clone. If we can ssh from the backup to the master, it will be:
backup$ hg clone --noupdate ssh://[email protected]//var/lib/wikidir
(note two slashes after machine name, this means absolute path, single slash would treat the rest as path relative to the wiki user home directory). It will create the directory named wikidir
in the current directory (it can be renamed or moved if needed). Let's also immediately create the file .hg/hgrc
containing something like:
[paths]
master = ssh://[email protected]//var/lib/wikidir
This is just an alias, thanks to it we will be later able to use hg pull master
instead of hg pull ssh://[email protected]//var/lib/wikidir
.
If we can ssh from the master to the backup, we proceed similarly:
master$ hg clone /var/lib/wikidir ssh://[email protected]//backup/wiki
Patch it to your needs, this example creates the new directory named /backup/wiki
. As above, note two slashes, if you replace them with one, $HOME/backup/wiki
will be created. And again, make an alias in .hg/hgrc
:
[paths]
backup = ssh://[email protected]//backup/wiki
Whichever command you used, /backup/wiki
should now contain only .hg
directory.
This is probably preferable for backup, but would you want to unpack the wiki files, just
backup$ cd /backup/wiki
backup$ hg update
(you can safely remove them later with brutal rm -rf *
, just leave .hg
). Whether they are unpacked, or not, you can try commands like
backup$ cd /backup/wiki
backup$ hg log
backup$ hg status
Commiting updates
So far we just imported and copied single version. Let's now configure our system to commit the changes. This is just a matter of writing the following shell script:
#!/bin/bash
cd /var/lib/wikidir
hg addremove
hg commit -m "Automatical backup"
and configuring it to be run at regular intervals (I do it every early morning) - from cron. Remember to configure it so it runs from the files (and repository) owner account. People preferences vary (anacron, cron.d, ....), here is the simple and safe solution:
$ sudo -u wikiuser crontab -e
and write:
4 5 * * * /path/to/wikibackupscript
The Mercurial commands used above have the following meaning: addremove
marks newly added files to be added by the next commit, and marks deleted files (on disk) for removal. commit
performs the actual commit to the repository.
Copying updates
The final step is to ensure updates are copied between machines.
If we are ssh
-ing from the backup to the master, we just need to run
hg --cwd /backup/wiki pull master
at regular intervals.
Obvious solution is to run it from cron (on the backup machine, using account which owns /backup/wiki
and is able to ssh to the master). One would usually configure it to run an hour or two after
the commiting script runs.
The pull
command grabs all new changesets from the remote repository (master) and saves them to the local .hg
. Use pull --update
if you want to update checked out copy to the newest version.
If we are ssh
-ing from master to backup, the simplest method is to extend the already written wikibackupscript (the one which addremoves and commits) with
hg push --force backup
Alternatively, spawn from cron
hg --cwd /var/lib/wikidir push --force backup
Final words
Nothing forces you to stick with just two repositories. Would you like to, you can create more clones (cloning either from production, or from backup, as you like, pulling automatically or manually) - for example, to test MoinMoin upgrade procedure, or to run development copy for plugin testing.
Having repository version-controlled makes one feel far more comfortable while reconfiguring, rearranging, or upgrading - would anything go wrong, it is always possible to go back.
The same procedure is probably implementable with Git, Bazaar, or Darcs. I recommend Mercurial as it's pull/push commands are not attempting any merges, are not updating the destination dir (unless asked to), and just are not doing anything except copying the new changesets - so they are safe to be used from cron.