views:

112

answers:

1

I am planning on using CouchDB on a project. But as the querying mechanism involves writing views (which are a lot like indexes on regular RDMBMS's) I was wondering, if the document database keeps getting updated a lot ( a write heavy database) would CouchDB perform well compared to a regular RDBMS? Or do we have to compact/re-index the system occasionally to make it perform faster?

+2  A: 

You might think of the pros/cons of the CouchDB view model this way. (CouchDB hackers may disagree but IMO it's accurate enough for users.)

  1. A view function always performs a full "table scan" when it is first created (just like an RDBMS BTW)
  2. As long as they have no side effects, map and reduce functions can be arbitrarily complex
  3. Every document and map/reduce result is cached and never calculated again
  4. If you add or change a document, it (and only it) will be re-computed (and cached) for that view

Given these, you can draw some conclusions about CouchDB performance:

  • There is never a re-index phase for the entire data set, just incremental per document update
  • Changing a view function forces re-building the entire index
  • Since both CouchDB and RDBMS must update the index for new data, it's reasonable to think performance will be similar for heavy update/insert usage.

Obviously YMMV and the standard cop-out, "you must test your own load" applies. However I will add a few more considerations.

  • I say RDBMS is flat out superior for exploratory-style querying your data. When you don't even know what questions to ask from your data, you really can't beat a language for querying that is structured.
  • However, once you define what you want to know, CouchDB (and perhaps Hadoop) provide the most rich querying system because you are just writing code.
  • If your data set is large, NoSQL databases will scale more easily. For example, CouchDB-Lounge allows a cluster of couches for parallel processing. Hadoop does the same so then it would come down to secondary considerations: familiarity, maintainability, CouchDB is a web server but requires a bit more DIY; Hadoop internalizes more cluster management at the cost of complexity, foreignness, etc.

I hope that helps shed some light on your decision!

jhs
I have been pressured into mentioning that you can also query a view with `stale=ok` which avoids updating the index with new data. That is correct however in my opinion `stale=ok` is the "global variable" of CouchDB -- usually not a good idea but it can be useful sometimes if you are an advanced user. My feeling is avoid it until it is obvious that you cannot. I prefer the technique of ensuring a view is always updated: http://wiki.apache.org/couchdb/Regenerating_views_on_update
jhs