views:

2293

answers:

3

I am trying to decide whether to use voldemort or couchdb for an upcoming healthcare project. I want a storage system that has high availability , fault tolerance, and can scale for the massive amounts of data being thrown at it.

What is the pros/cons of each?

Thanks

+1  A: 

Is memcacheDB an option? I've heard that's how Digg handled HA issues.

scunliffe
sure, what would be the advantage of memcacheDB over the other 2
py213py
Whats HA issues?
Sam152
lol. how is memcached fault tolerant?
Cory R. King
@Sam152, HA High Availibility
tuinstoel
Memcache can be made fault tolerant with some configuration. It's already "distributed". The idea is that your application will check against Memcache before looking in the database. If a Memcache server goes down, all other Memcache servers will be freshened with the lost on an as-needed basis.
Nolte Burke
(con't) You still need to have an underlying database engine to store "permanent" copies and to run queries against.
Nolte Burke
Actually the parent post refers to memcache**DB** Which is the memcached code with a Berkley Db backend: http://memcachedb.org/
Neel
+3  A: 

Project Voldemort looks nice, but I haven't looked deeply into it so far.

In it current state CouchDB might not be the right thing for "massive amounts of data". Distributing data between nodes and routing queries accordingly is on the roadmap but not implemented so far. The biggest known production setups of CouchDB use "tables" ("databases" in couch-speak) of about 200G.

HA is not natively supported by CouchDB but can build easily: All CouchDB nodes are replicating the database nodes between each other in a multi-master setup. We put two Varnish proxies in front of the CouchDB machines and the Varnish boxes are made redundant with CARP. CouchDBs "build from the Web" design makes such things very easy.

The most pressing issue in our setup is the fact that there are still issues with the replication of large (multi MB) attachments to CouchDB documents.

I suggest you also check the traditional RDBMS route. There are huge issues with available talent outside the RDBMS approach and there are very capable offerings available from Oracle & Co.

mdorseif
+2  A: 

Not knowing enough from your question, I would nevertheless say Project Voldemort or distributed hash tables (DHTs) like CouchDB in general are a solution to your problem of HA.

Those DHTs are very nice for high availability but harder to write code for than traditional relational databases (RDBMS) concerning consistency.

They are quite good to store document type information, which may fit nicely with your healthcare project but make development harder for data.

  • The biggest limitation of most stores is that they are not transactionally safe (See Scalaris for an transactionally safe store) and you need to ensure data consistency by yourself - most use read time consistency by merging conflicting data). RDBMS are much easier to use for consistency of data (ACID)
  • Joining data is much harder too. In RDBMs you can easily query data over several tables, you need to write code in CouchDB to aggregate data. For other stores Hadoop may be a good choice for aggregating information.

Read about BASE and the CAP theorem on consistency vs. availability.

See

Stephan Schmidt