views:

2035

answers:

5

We're looking at CouchdDB for a CMS-ish application. I know the database is implemented as a set of files in the file system, what I'm looking for are common patterns, best practices and workflow advice surrounding backing up our production database, and, especially, the process of cloning the database for use in development and testing.

Is it sufficient to just copy the files on disk out from under a live running instance? Can you clone database data between two live running instances?

Advice and description of the techniques you use will be greatly appreciated.

+9  A: 

CouchDB supports replication, so just replicate to another instance of CouchDB and backup from there, avoiding disturbing where you write changes to.

http://wiki.apache.org/couchdb/FrequentlyAskedQuestions#how_replication

You literally send a POST request to your CouchDB instance telling it where to replicate to, and it Works(tm)

EDIT: You can just cp out the files from under the running database as long as you can accept the I/O hit.

Marc Gear
+11  A: 

Another thing to be aware of is that you can copy files out from under a live database. Given that you may have a possibly large database, you could just copy it OOB from your test/production machine to another machine.

Depending on the write load of the machines it may be advisable to trigger a replication after the copy to gather any writes that were in progress when the file was copied. But replication of a few records would still be quicker than replication the entire database.

For reference see: http://wiki.apache.org/couchdb/FilesystemBackups

Paul J. Davis
+2  A: 

I'd like to second Paul's suggestion: just cp your database files from under the live server if you can take the i/o-load hit. If you run a replicated copy anyway, you can safely copy from that too, without impacting your master's performance.

Jan Lehnardt
+2  A: 

CouchDB also works very nicely with Filesystem snapshots offerd by modern filesystems like ZFS. Since the database file always is in a consistent state you can take the snapshot of the file any time without weakening the integrity guarantees provided by couchdb.

This results in nearly no I/O overhead. In case you have e.g. acidentially deleted a document from the database you can move the snapshot to an other machine and extracty the missing data there. You might even be able to replicate back to the production database but I never have tried that.

But always make sure you use exactly the same couchdb revisions when moving around database files. The on-disk format is still evolving in incompatible ways.

mdorseif
A: 

Wondering which files to copy. There is a main file, but also some hidden files.

Bob Roberts