tags:

views:

58

answers:

1

The Situation

I have a pretty large Subversion repository that I am trying to backup efficiently. The repository size is about 6 GB and growing. Some large commits are around 500 to 1GB in size.

I am trying to backup this repository to an off-site location, through an Internet up-link.

Explaining the sheer size of it

To whomever is wondering, we keep the whole production environment for various sites (config files, EXEs, data files) in this one repository so that we can rollback to an existing working version and track the changes to the production setup. Code is kept on a different repository.

The How

Here is what I am actually doing:

  1. Backup of the repository to a working folder on the server using the "svnadmin hotcopy SRCDIR TGTDIR"
  2. Encrypt and compress that repository using "rsyncrypto -r SRCPATH DSTPATH KEYSPATH CERTIFICATE"
  3. Backup that encrypted version to an off-site location using "rsync -Crtv" (actually cwRsync because I am running on Windows)

The Problem

First I have to say that it works, though it still has an underlying issue.

The problem lies with the fact that I was expecting that each time the process would run, only the new revision files/data would be copied ([repos]/db/revs/0/...) thus requiring only bandwidth and time when a large commit is made. However, instead:

  • If I run only step #3 many times, rsync behaves as it should and nothing is copied because nothing has changed.
  • If I run only steps #2 & #3 many times, rsync also behaves well. The envrypted version is the same everytime and rsync doesn't have to transmit anything.
  • But, it seems that every time I run all three steps (with a new commit having been made to the repository) the whole repository is being re-uploaded in full. Thus, defeating the whole purpose of using rsync in the first place.

It is as though the files in [repos]/db/revs/0/... are changing everytime I make a hotcopy.

The Questions

Is this an expected behavior from "svnadmin hotcopy" that the [repos]/db/revs/0/... are changing from one hotcopy to another?

Any suggestion or options I could use to make this hotcopy rsync friendly or say rsyncable?

I am not quite sure that the use of 'svnadmin dump' on the whole repository would produce an "rsyncable" file.

A: 

I don't know the details of how Subversion stores its backup files, so I don't know whether or not a hotcopy from r5678 should be block-identical to a hotcopy from r6789 (which is what rsync would need to do an efficient copy). What we do when backing up our development repository is to do a full backup (hotcopy then back up the entire however many gigs) every week, and do an incremental backup every day using the following command:

svnadmin dump /path/to/repos -r latest-backed-up-rev:latest-repos-rev --incremental --deltas

The --incremental option means "This must be applied to a repository at revision latest-backed-up-rev", and the --deltas option uses a binary format that isn't much larger than the actual change in repository size itself. If you replace step 1 with a dump that only adds one small file, then your rsync will behave just fine.

I have a shell script that automates finding backed-up and latest rev and creates the proper file, if you want it.

Dani Church