How does CouchDB perform for a regularly updated dataset?

You might think of the pros/cons of the CouchDB view model this way. (CouchDB hackers may disagree but IMO it's accurate enough for users.)

A view function always performs a full "table scan" when it is first created (just like an RDBMS BTW)
As long as they have no side effects, map and reduce functions can be arbitrarily complex
Every document and map/reduce result is cached and never calculated again
If you add or change a document, it (and only it) will be re-computed (and cached) for that view

Given these, you can draw some conclusions about CouchDB performance:

There is never a re-index phase for the entire data set, just incremental per document update
Changing a view function forces re-building the entire index
Since both CouchDB and RDBMS must update the index for new data, it's reasonable to think performance will be similar for heavy update/insert usage.

Obviously YMMV and the standard cop-out, "you must test your own load" applies. However I will add a few more considerations.

I say RDBMS is flat out superior for exploratory-style querying your data. When you don't even know what questions to ask from your data, you really can't beat a language for querying that is structured.
However, once you define what you want to know, CouchDB (and perhaps Hadoop) provide the most rich querying system because you are just writing code.
If your data set is large, NoSQL databases will scale more easily. For example, CouchDB-Lounge allows a cluster of couches for parallel processing. Hadoop does the same so then it would come down to secondary considerations: familiarity, maintainability, CouchDB is a web server but requires a bit more DIY; Hadoop internalizes more cluster management at the cost of complexity, foreignness, etc.

I hope that helps shed some light on your decision!

I have been pressured into mentioning that you can also query a view with `stale=ok` which avoids updating the index with new data. That is correct however in my opinion `stale=ok` is the "global variable" of CouchDB -- usually not a good idea but it can be useful sometimes if you are an advanced user. My feeling is avoid it until it is obvious that you cannot. I prefer the technique of ensuring a view is always updated: http://wiki.apache.org/couchdb/Regenerating_views_on_update

jhs 2010-05-19 06:24:17

ansaurus

tags:

views:

answers:

How does CouchDB perform for a regularly updated dataset?

related questions