views:

360

answers:

3

Is there a way to back up a mercurial repository while preserving the files' timestamps?

Right now, I'm using hg clone to copy the repository to a staging directory, and the backup program picks up the files from there. I'm not pointing the backup program directly at the repository because I don't want it to be changing (from commits) while the backup is happening.

The problem is that hg clone changes all the files' timestamps to the current time, so the backup program (which I cannot change) thinks everything has been modified.

+4  A: 

Plan A: When the source and destination directories reside on the same file system, hg clone -U would simply hardlink all its files in the repository, without changing timestamps. This approach is quite fast and always safe (files are unlinked lazily when written to).

If you need to, you can clone on the same file system first, and then rsync this new clone over to another file system.

Plan B: It's usually safe to use rsync or some other file-based synchronization tool. Mercurial doesn't store anything magical on disk, just plain files.

There is a race condition, when you happen to commit to this repository at the same time when rsync is running, but I think it's negligible because a "hg rollback" should be able to clean up your such inconsistencies if you restore from a broken backup. Do note, that rollback cannot recover if you had multiple separate transactions (such as multiple "push" or "commit" commands) in the rsync window, or run destructive operations that tamper with history (such as rebase, hg strip, and some MQ commands).

intgr
Yeah, it's the race condition I'm worried about. Would you be able to tell when an `hg rollback` would be needed?
Jim Hunziker
`hg verify` will check all the checksums/revision hashes in your repository and nag you about errors.
intgr
One more comment on this (you might want to add): `hg clone -U` is the way to go, since the working copy that clone makes doesn't have hard links. Just the repository does.
Jim Hunziker
+3  A: 

I suggest using hg pull instead of hg clone. So you'll keep a mirror of the repository on your server and update it periodically with hg pull. You then let your backup program take a backup of that. When you use hg pull you will transfer the newest history and only changed files under .hg/store/data which were actually effected by the pull.

Here I tested this by making a small repo with two files: a.txt and b.txt. I then cloned the repository "to the server" using hg clone --noupdate. That ensures that we have no working copy on the server -- it only needs the history found in .hg.

The timestamps looked like this after the clone:

% ll --time-style=full .hg/store/data
total 8.0K
-rw-r--r-- 1 mg mg 76 2009-11-25 20:07:52.000000000 +0100 a.txt.i
-rw-r--r-- 1 mg mg 69 2009-11-25 20:07:52.000000000 +0100 b.txt.i

As you noted, they are all identical since the files were all just created by the clone operation. I then changed the original repository (the one on the client) and made a commit. After pulling the changeset I got these timestamps:

% ll --time-style=full .hg/store/data
total 8.0K
-rw-r--r-- 1 mg mg 159 2009-11-25 20:08:47.000000000 +0100 a.txt.i
-rw-r--r-- 1 mg mg  69 2009-11-25 20:07:52.000000000 +0100 b.txt.i

Notice how the timestamp for a.txt.i has been updated (I only touched a.txt in my commit) while the timestamp for b.txt.i has been left alone.

If your backup software is smart, it will even notice that Mercurial has only appended data to a.txt.i. This means that the new a.txt.i file is identical to the old a.txt.i file up to certain point -- the backup program should therefore only copy the final part of the file. Rsync is an example of a backup program that will notice this.

Martin Geisler
+1  A: 

Here's a hg extension that might help: http://mercurial.selenic.com/wiki/TimestampExtension

djc