views:

4987

answers:

7

I am looking at CouchDB, which has a number of appealing features over relational databases including:

  • intuitive REST/HTTP interface
  • easy replication
  • data stored as documents, rather than normalised tables

I appreciate that this is not a mature product so should be adopted with caution, but am wondering whether it is actually a viable replacement for an RDBMS (in spite of the intro page saying otherwise - http://couchdb.apache.org/docs/intro.html).

  1. Under what circumstances would CouchDB be a better choice of database than an RDBMS (e.g. MySQL), e.g. in terms of scalability, design + development time, reliability and maintenance.
  2. Are there still cases where an RDBMS is still clearly the right choice?
  3. Is this an either-or choice, or is a hybrid solution more likely to emerge as best practice?
+4  A: 

Until someone gives a more in-depth answer, here are some pros and cons for CouchDB

Pros:

  • you don't need to fit your data into one of those pesky higher-order normal forms
  • you can change the "schema" of your data at any time
  • your data will be indexed exactly for your queries, so you will get results in constant time.

Cons:

  • you need to create views for each and every query, i.e. ad-hoc like queries (such as concatenating dynamic WHERE's and SORT's in an SQL) queries are not available.
  • you will either have redundant data, or you will end up implementing join and sort logic yourself on "client-side" (e.g. sorting a many-to-many relationship on multiple fields)

Pros or Cons:

  • creating your views are not as straightforward as in SQL, it's more like solving a puzzle. Depends on your type if this is a pro or a con :)
Zed
Since asking the question, I've been checking out other sources and it seems to me that the main benefit of using CouchDB is its "real-world" representation of data versus the normalised data structure required by more traditional RDBMS's. See http://books.couchdb.org/relax/intro/why-couchdb for furture explanation. I think answers to the other questions I asked aren't available yet.
Andrew Whitehouse
A: 

If you are working with tabular data where there is only a shallow data hierarchy, than an RDBMS system is probably your best choice. This is the main use for RDBMS systems, and the documentation and tool support is very good.

For more nested data like xml, a document database should provide faster access to your data. Also, the storage model more closely resembles that of the data, so retrieval should be more straight forward.

Dana the Sane
+9  A: 

CouchDB is one of several available 'key/value stores', others include oldies like BDB, web-oriented ones like Persevere, MongoDB and CouchDB, new super-fast like memcached (RAM-only) and Tokyo Cabinet, and huge stores like Hadoop and Google's BigTable (MongoDB also claims to be on this space).

There's certainly space for both key/value stores and relational DBs. Traditionally, most RDBs are considered a layer above key/value. For example, MySQL used to use BDB as an optional backend for tables. In short, key/values know nothing about fields and relationships, which are the foundations of SQL.

Key/value stores typically are easier to scale, which makes them an attractive choice when growing explosively, like Twitter did. Of course, that means that any relationships between the stored values have to be managed on your code, instead of just declared in SQL. CouchDB's approach is to store big 'documents' in the value part, making them (mostly) self contained, so you can get most of the needed data in a single query. Many use cases fit on this idea, others don't.

The current theme I see is that after the "Rails doesn't scale!!" scare, now many people is realizing that it's not about your web framework; but about intelligent cacheing, to avoid hitting the database, and even the webapp when possible. The rising star there is memcached.

As always, it all depends on your needs.

Javier
+4  A: 

This one is a hard question to answer. So I'll try to highlight the areas where CouchDB might work against you.

The two greatest sources of difficulty on the Couch Users and Dev mailing lists that people have are:

  • Complex Joins of Data.
  • Multi-Step Map/Reduce.

Couch Views are pretty much islands unto themselves. If you need to aggregate/merge/intersect a set of views you pretty much have to do so in the application layer for now. There are some tricks you can do with view collation and complex keys to help with joins but these only go so far for some types of data. This may or may not be livable for different applications. That being said many times this problem can reduced or eliminated by structuring your data differently.

The comments of the other folks on this question demonstrate some of the different types of data that are well suited to CouchDB.

One other thing to keep in mind is that a lot of times the data you might need to combine/merge/intersect would be data that you would do offline in an RDBMS database anyway so you might not lose anything by doing the same in CouchDB.

Short Answer: I think eventually CouchDB will be able to handle any kind of problem you want to throw at it. But the comfort level you have using it may differ from developer to developer. It's somewhat subjective I think. I happen to like using a turing complete language to query my data with and keeping more logic in the application layer. Your mileage may vary.

Jeremy Wall
A: 

Correct me if I am wrong. Couchdb is useless for the cases when you need to validate uniqueness of docs over multiple fields. For example it's impossible to enforce validation rule like "both login and email required to be unique" and keep data in consustent state. You can check that before saving the doc, but someone can push before you and data becomes inconsistent.

Sam
CouchDB does have ways of enforcing uniqueness. Its all at the key level though. If you need both login and email to be unique then simply derive the documents id from them and you will never be able to insert a duplicate login and email in the db. It's different but just as effective.
Jeremy Wall
Consider 2 keys: "[email protected]" and "[email protected]". Both users have the same email address [email protected].
Sam
A: 

Sam you have to take another approch with CouchDB and in general with map or document based database. You can't define a constraint, such a unique, but you can query data to check if that email is used and if that login is used too. That's the right approch, you have to change your mind.

Filippo
+2  A: 

I recently attended the NoSQL conference in London and think I have a better idea now how to answer the original question. I also wrote a blog post, and there are a couple of other good ones.

Key points:

  • We have accumulated probably 30 years knowledge of adminstering relational databases, so shouldn't replace them without careful consideration; non-relational data stores are less mature than relational ones, and so are inherently more risky to adopt
  • There are different types of non-relational data store; some are key-value stores, some are document stores, some are graph databases
  • You could use a hybrid approach, e.g. a combination of RDBMS and graph data store for a social software site
  • Document data stores (e.g. CouchDB and MongoDB) are probably the closest to relational databases and provide a JSON data structure with all the fields presented hierarchically which avoids having to do table joins, and (some might argue) is an improvement on the traditional object-relational mapping that most applications currently use
  • Non-relational databases support replication (including master-master); relational databases support replication too but it may not be as comprehensive as the non-relational option
  • Very large sites such as Twitter, Digg and Facebook use Cassandra, which is built from the ground up to support clustering
  • Relational databases are probably suitable for 90% of cases

In summary, consensus seems to be "proceed with caution".

Andrew Whitehouse