views:

427

answers:

3

We're setting out to build an online platform (API, Servers, Data, Wahoo!). For context, imagine that we need to build something like twitter, but with the comments (tweets) organized around a live event. Information about the live event itself must be delivered to clients as fast and consistently as possible, while comments about the event can probably wait a bit longer to be delivered. We'll be read-heavy after the live event finishes.

Scalability is very important. We want to start out renting VPS slices, and scale from there. I'm a big fan of the cloud, and would like to remain there as long as possible. We'll probably be using ruby.

I'm convinced that I want to try a document store instead of an RDBMS. I like the idea of schema-less storage and the promises of easier scalability by focusing on key-value.

The problem is I don't know which technology is the most appropriate for our platform. I've looked at Couch, Mongo, Tokyo Cabinet, Cassandra, and an RDBMS with blobbed documents. Any help picking the right tool for this particular job?

+5  A: 

Checkout the NO SQL alternatives comparison by BJ Clark.

Scalability is very important.

then you need to consider the excerpts from his blog.

  1. Tokyo Cabinet - Doesn't scale
  2. Redis - Doesn't scale
  3. Project Voldemort - scales
  4. MongoDB - limted (sharding is been implemented)
  5. Cassandra - scales
  6. Amazon S3 - scales
  7. Couch - Doesn't scale
  8. MySQL - Doesn't scale

And consider HyperTable. this also serios contender in No-SQL alternatives. Its a open source implementation of Google BigTable concept. I believe It scales well because its extensively used by the chinese search engine Baidu & entertainement portal rediff.

you were saying

Information about the live event itself must be delivered to clients as fast and consistently as possible, while comments about the event can probably wait a bit longer to be delivered. We'll be read-heavy after the live event finishes.

this is something like twitters approach.Make sure that your language selection also very important, because twitter initially gone with Ruby for backend message delievery but they were saying its not a correct choice and they have moved the entire mesage delivery system to Scala language.

They are still using Ruby for all front-end. If you wanted to go ahead with highly reliable, fault tolerant system that well suited for scalable environments, then you should consider your pick of the language. Have a look at Scala or Erlang..

Cheers

RameshVel

Ramesh Vel
+1 for the excellent interview
Wayne Conrad
Why point 7. Couch - doesn't scale? Take a look at http://cloudant.com/ and http://couchio.com/
filippo
Yeah, I'm also confused about Couch. There seems to be some serious disagreement about the replication approach to scaling as a whole. The Couch guys list scalability as one of their main features, while the rest of the world seems to blow them off.
Sean Clark Hess
CouchDB performance has increased an order of magnitude in each release. The current trunk performance is nothing like it was in August when that article was written.Your preferred scaling strategy will depend on your situation. You may need replication or sharding and CouchDB has builtin peer to peer replication that works great and with couchdb-lounge you can do sharding.
mikeal
+1  A: 

Ramesh has a good summary. I would add that Cassandra has a richer data model than vanilla Dynamo clones (like Voldemort or Dynomite): rows with named, sorted columns rather than just key/value. Cassandra is being used by Twitter, Mahalo, Ooyala, SimpleGeo, WebEx, and others (http://n2.nabble.com/Cassandra-users-survey-td4040068.html), at least some of which are running Cassandra clusters on EC2 or rackspace cloud servers.

jbellis
+1  A: 

If you want to scale horizontally (distribute your data over more than one node) you have to take the CAP theorem into account.

http://www.julianbrowne.com/article/viewer/brewers-cap-theorem

It is not easy stuff but you have to choose, there is always some kind of trade off.

AABBCCDD
Thanks... That was the best article on the CAP theorem I'd read.
Sean Clark Hess