I am planning on using CouchDB on a project. But as the querying mechanism involves writing views (which are a lot like indexes on regular RDMBMS's) I was wondering, if the document database keeps getting updated a lot ( a write heavy database) would CouchDB perform well compared to a regular RDBMS? Or do we have to compact/re-index the system occasionally to make it perform faster?
+2
A:
You might think of the pros/cons of the CouchDB view model this way. (CouchDB hackers may disagree but IMO it's accurate enough for users.)
- A view function always performs a full "table scan" when it is first created (just like an RDBMS BTW)
- As long as they have no side effects, map and reduce functions can be arbitrarily complex
- Every document and map/reduce result is cached and never calculated again
- If you add or change a document, it (and only it) will be re-computed (and cached) for that view
Given these, you can draw some conclusions about CouchDB performance:
- There is never a re-index phase for the entire data set, just incremental per document update
- Changing a view function forces re-building the entire index
- Since both CouchDB and RDBMS must update the index for new data, it's reasonable to think performance will be similar for heavy update/insert usage.
Obviously YMMV and the standard cop-out, "you must test your own load" applies. However I will add a few more considerations.
- I say RDBMS is flat out superior for exploratory-style querying your data. When you don't even know what questions to ask from your data, you really can't beat a language for querying that is structured.
- However, once you define what you want to know, CouchDB (and perhaps Hadoop) provide the most rich querying system because you are just writing code.
- If your data set is large, NoSQL databases will scale more easily. For example, CouchDB-Lounge allows a cluster of couches for parallel processing. Hadoop does the same so then it would come down to secondary considerations: familiarity, maintainability, CouchDB is a web server but requires a bit more DIY; Hadoop internalizes more cluster management at the cost of complexity, foreignness, etc.
I hope that helps shed some light on your decision!
jhs
2010-05-18 08:35:53
I have been pressured into mentioning that you can also query a view with `stale=ok` which avoids updating the index with new data. That is correct however in my opinion `stale=ok` is the "global variable" of CouchDB -- usually not a good idea but it can be useful sometimes if you are an advanced user. My feeling is avoid it until it is obvious that you cannot. I prefer the technique of ensuring a view is always updated: http://wiki.apache.org/couchdb/Regenerating_views_on_update
jhs
2010-05-19 06:24:17