views:

133

answers:

2

This is one of those "I know I shouldn't do this but it's oh so convenient." questions. Sorry about that.

I plan to use CouchDB for storing a bunch of documents and keeping their entire revision history. CouchDB does the versioning automatically, but it is strongly discouraged for programmer's use:

"You cannot rely on document revisions for any other purpose than concurrency control."

From what I've found on the CouchDB wiki, the versions can get deleted either during compaction or during replication. As far as I can tell, Compaction must always be triggered manually and Replication occurs only when there's more than one database server.

The question is: if I won't run compaction and will use only single database instance for my documents, can I just use CouchDB's document versioning and expect it to work?

What other problems I might run into? E.g. does not running compaction hurt the performance or consume significantly more disk space (than if I did handle the versioning manually)?

+5  A: 

If you reformulate the sentence a bit, it says: "Any update, regardless how minor can completely change the behavior. We guarantee you can use it for concurrency, but nothing else."

Therefore I would not rely on it because in our industry, stuff like that will haunt you in 6 months, unless you can absolutely guarantee that you never, ever update CouchDB.

Michael Stum
+4  A: 

If it's wrong, then I don't want to be right!

Actually, no. It's wrong. Michael explained it well: best case scenario it makes your app very not future-proof, and worst-case you will get bad bugs that force you to re-architect at an inconvenient time.

Consider Google App Engine. What is their suggested transaction pattern?

  1. Begin transaction
  2. Fetch your entities by key
  3. Modify entities and save them
  4. End transaction

These form a function which re-runs if the transaction fails. Why? And why must fetch be within the transaction instead of hanging around in an outer scope?

Because App Engine uses MVCC internally. If you get a collision (your revision is wrong because somebody else updated), then they just re-run your function. The next iteration will fetch the newer revision, update from the newer data, and re-put with a correct revision. The point is, Google does not expose this to users because it is not a suitable framework to build application-level versioning.

jhs