views:

2949

answers:

12

I'm looking to start using a key/value store for some side projects (mostly as a learning experience), but so many have popped up in the recent past that I've got no idea where to begin. Just listing from memory, I can think of:

  1. CouchDB
  2. MongoDB
  3. Riak
  4. Redis
  5. Tokyo Cabinet
  6. Berkeley DB
  7. Cassandra
  8. MemcacheDB

And I'm sure that there are more out there that have slipped through my search efforts. With all the information out there, it's hard to find solid comparisons between all of the competitors. My criteria and questions are:

  1. (Most Important) Which do you recommend, and why?
  2. Which one is the fastest?
  3. Which one is the most stable?
  4. Which one is the easiest to set up and install?
  5. Which ones have bindings for Python and/or Ruby?

Edit:
So far it looks like Redis is the best solution, but that's only because I've gotten one solid response (from ardsrk). I'm looking for more answers like his, because they point me in the direction of useful, quantitative information. Which Key-Value store do you use, and why?

Edit 2:
If anyone has experience with CouchDB, Riak, or MongoDB, I'd love to hear your experiences with them (and even more so if you can offer a comparative analysis of several of them)

+6  A: 

They all have different features. And don't forget Voldermort (http://project-voldemort.com/) which is actually used/tested by LinkedIn in their production before each release.

It's hard to compare. You have to ask yourself what you need: e.g. do you want partitioning? if so then some of them, like CouchDB, won't support it. Do you want erasure coding? Then most of them don't have that. Etc.

Berkeley DB is a very basic, low level storage engine, that perhaps can be excused from this discussion. Several key-value systems are built on top of it, to provide additional features like replication, versioning, coding, etc.

Also, what does your application need? Several of the solutions contain complexity that may not be necessary. E.g. if you just store static data that won't change, you can store them under data's SHA-1 content hash (i.e. use the content-hash as key). In this case, you don't have to worry about freshness, synchronization, versioning, and lots of complexities can be removed.

OverClocked
Note CouchDB now has Lounge and BigCouch for partitioning. The latter is based on Amazon's Dynamo clustering scheme, so you get all that fun variable durability, replication and quorum nonsense, as well.
+11  A: 
Which do you recommend, and why?

I recommend Redis. Why? Continue reading!!

Which one is the fastest?

I can't say whether its the fastest. But Redis is fast. Its fast because it holds all the data in RAM. Recently, virtual memory feature was added but still all the keys stay in main memory with only rarely used values being swapped to disk.

Which one is the most stable?

Again, since I have no direct experience with the other key-value stores I can't compare. However, Redis is being used in production by many web applications like github and superfeedr among many others.

Which one is the easiest to set up and install?

Redis is fairly easy to setup. Grab the source and on a Linux box run make install. This yields redis-server binary that you could put it on your path and start it.

redis-server binds to localhost:6379 by default. Have a look at redis.conf that comes with the source for more configuration and setup options.

Which ones have bindings for Python and/or Ruby?

Redis has excellent Ruby and Python support. Ruby client is more feature complete ( for instance, it supports consistent hashing that the python client doesn't ) and popular ( redis-rb has 300+ watchers while redis-py has only 100+ ) compared to the python client.

[EDIT: response to xorlev's comment]

@xorlev: Thanks for the comment as I had missed something important to mention and you reminded me of it.

Memcached is just a simple key-value store. Redis holds everything in memory plus its also persistent. It writes back to disk in a non-blocking way. Redis also supports complex value types ( just look at slide number 2 ) like lists, sets and sorted sets and at the same time provides a simple interface to these value types.

There is also make 32bit that makes all pointers only 32-bits in size even on 64 bit machines. This saves considerable memory on machines with less than 4GB of RAM.

Redis is also very well documented and you can give it a try online.

Hope this helps.

ardsrk
If you just want to hold everything in memory, I'd go with memcached.
Xorlev
This is the best answer so far. Redis looks awesome.
Mike Trpcic
@Mike: Salvatore, ( http://twitter.com/antirez ) the creator of Redis is very thoughtful and open about new ideas. Look at his tweet stream and you would know why redis is attracting more users and developers.
ardsrk
You can actually just run "make", no need to "make install".
Don Spaulding
@don: you are right. The other typically used option is `make noopt` to generate a binary without optimizations. This helps to run redis under gdb.
ardsrk
There is also `make 32bit` that makes all pointers only 32-bits in size even on 64 bit machines. This saves considerable memory on machines with less than 4GB of RAM.
ardsrk
+4  A: 

I really like memcached personally.

I use it on a couple of my sites and it's simple, fast, and easy. It really was just incredibly simple to use, the API is easy to use. It doesn't store anything on disk, thus the name memcached, so it's out if you're looking for a persistent storage engine.

Python has python-memcached.

I haven't used the Ruby client, but a quick Google search reveals RMemCache

If you just need a caching engine, memcached is the way to go. It's developed, it's stable, and it's bleedin' fast. There's a reason LiveJournal made it and Facebook develops it. It's in use at some of the largest sites out there to great effect. It scales extremely well.

Xorlev
in ruby the much easier is memcache-client. Integrate in Rails
shingara
+4  A: 

At this year's PyCon, Jeremy Edberg of Reddit gave a talk:

http://pycon.blip.tv/file/3257303/

He said that Reddit uses PostGres as a key-value store, presumably with a simple 2-column table; according to his talk it had benchmarked faster than any other key-value store they had tried. And, of course, it's very mature.

Ultimately, OverClocked is right; your use case determines the best store. But RDMBSs have long been (ab)used as key-value stores, and they can be very fast, too.

AdamKG
I saw that talk when it was posted on Reddit, but couldn't find any solid examples of using Postgres as a KVS in the way Reddit does.
Mike Trpcic
Start with `CREATE TABLE data (name varchar(40), value text);` and see how far you can go...
Kylotan
+1  A: 

Just to make the list complete: there's Dreamcache, too. It's compatible with Memcached (in terms of protocol, so you can use any client library written for Memcached), it's just faster.

grokk
+5  A: 

I've been playing with MongoDB and it has one thing that makes it perfect for my application, the ability to store complex Maps/Lists in the database directly. I have a large Map where each value is a list and I don't have to do anything special just to write and retrieve that without knowing all the different keys and list values. I don't know much about the other options but the speed and that ability make Mongo perfect for my application. Plus the Java driver is very simple to use.

MattGrommes
Is there any posts comparing MongoDB to CouchDB? Couch also allows you to have complex maps/lists in the database (As JavaScript functions), and I'm wondering which is the faster/more stable of the two.
Mike Trpcic
From a couple benchmarks that I have seen recently, Mongo is much, much faster than Couch, Mongo even beat out MySQL under certain conditions.
Redbeard 0x0A
Reddit has had a couple of links about this: http://www.reddit.com/r/programming/comments/atnpb/what_are_the_merits_of_couchdb_over_mongodb_and/http://jayant7k.blogspot.com/2009/08/document-oriented-data-stores.html
MattGrommes
@MattGrommes: Maybe you can clarify what you mean? Storing lists is trivial in any document database, not just MongoDB, no?
+2  A: 

There is also zodb.

mikerobi
+4  A: 

One distinction you have to make is what will you use the DB for? Don't jump on board just because it's trendy. Do you need a key value store? or do you need a document based store? What is your memory footprint requirement? running it on a small VM or a separate one?

I recommend listing your requirements first and then seeing which ones overlap with your requirements.

With that said, I have used CouchDB/MongoDB and prefer to use MongoDB for its ease of setup and best transition from mysql style queries. I chose mongodb over sql because of dynamic schemas(no migration files!) and better data modeling(arrays, hashes). I did not evaluate based on scalability.

MongoMapper is a great MongoDB orm mapper for Ruby and there's already a working Rails 3 fork.

I listed some more details about why I prefered mongodb in my scribd slides http://tommy.chheng.com/index.php/2010/02/mongodb-for-natural-development/

tommy chheng
There are no requirements, I just want to LEARN, and getting opinions from people who have already learned is the best way to find a solid footing to make a jumping off point.
Mike Trpcic
i think learning by trying is a good start. i would suggest thinking of a sample app, say a twitter app, and try modeling the data architecture and queries in each of the respective languages. you don't even need to code, just see what the queries are like for "followers of followers", etc. this will give you insight into how easy it will be to use.
tommy chheng
+4  A: 

I notice how everyone is confusing memcached with memcachedb. They are two different systems. The op asked about memcachedb.

memcached is memory storage. memcachedb uses Berkeley DB as its datastore.

drr
This is true. I've had minor experience with memcached, but I'm looking to familiarize myself with memcacehedb or another KVS.
Mike Trpcic
Your observation would be better as a comment.
+2  A: 

I only have experience with Berkeley DB, so I'll mention what I like about it.

  • It is fast
  • It is very mature and stable
  • It has outstanding documentation
  • It has C,C++,Java & C# bindings out of the box. Other language bindings are available. I believe Python comes with bindings as part of its "batteries".

The only downside I've run into is that the C# bindings are new and don't seem to support every feature.

Ferruccio
+1 for BDB. It scales well, fast enough for 1MB per sec kind of txn and very robust.
Jack
And it runs in memory...
Joel
+5  A: 

You need to understand what modern NoSQL phenomenon is about.
It is not about key-value storages. They've been available for decades (BerkeleyDB for example). Why all the fuss now ?
It is not about fancy document or object oriented schemas and overcoming "impedance mismatch". Proponents of these features have been touting them for years and they got nowhere.
It is simply about adressing 3 technical problems: automatic (for maintainers) and transparent (for application developers) failover, sharding and replication. Thus you should ignore any trendy products that do not deliver on this front. These include Redis, MongoDB, CouchDB etc. And concentrate on truly distributed solutions like cassandra, riak etc.

Otherwise you'll loose all the good stuff sql gives you (adhoc queries, Crystal Reports for your boss, third party tools and libraries) and get nothing in return.

Vagif Verdi
As mentioned in my comment on OverClocked's post, BigCouch brings automatic sharding, failover and replication to the CouchDB world. I believe MongoDB has sharding and replication, as well.
+1  A: 

Cassandra seems to be popular.

Cassandra is in use at Digg, Facebook, Twitter, Reddit, Rackspace, Cloudkick, Cisco, SimpleGeo, Ooyala, OpenX, and more companies that have large, active data sets. The largest production cluster has over 100 TB of data in over 150 machines.

Justice
Definitely a *lot* of momentum behind this project, but very severe design decisions may make it difficult to use for certain tasks. Unclear how this will play out in the long run, particularly as far as relevance/usability to/for smaller (i.e. non-worldwide-scale) users.