ansaurus

Question

Which key value store is the most promising/stable?

Answer 1

+6 A:

They all have different features. And don't forget Voldermort (http://project-voldemort.com/) which is actually used/tested by LinkedIn in their production before each release.

It's hard to compare. You have to ask yourself what you need: e.g. do you want partitioning? if so then some of them, like CouchDB, won't support it. Do you want erasure coding? Then most of them don't have that. Etc.

Berkeley DB is a very basic, low level storage engine, that perhaps can be excused from this discussion. Several key-value systems are built on top of it, to provide additional features like replication, versioning, coding, etc.

Also, what does your application need? Several of the solutions contain complexity that may not be necessary. E.g. if you just store static data that won't change, you can store them under data's SHA-1 content hash (i.e. use the content-hash as key). In this case, you don't have to worry about freshness, synchronization, versioning, and lots of complexities can be removed.

OverClocked 2010-03-04 04:25:30

Note CouchDB now has Lounge and BigCouch for partitioning. The latter is based on Amazon's Dynamo clustering scheme, so you get all that fun variable durability, replication and quorum nonsense, as well.

2010-10-04 04:11:06

Answer 2

+11 A:

Which do you recommend, and why?

I recommend Redis. Why? Continue reading!!

Which one is the fastest?

I can't say whether its the fastest. But Redis is fast. Its fast because it holds all the data in RAM. Recently, virtual memory feature was added but still all the keys stay in main memory with only rarely used values being swapped to disk.

Which one is the most stable?

Again, since I have no direct experience with the other key-value stores I can't compare. However, Redis is being used in production by many web applications like github and superfeedr among many others.

Which one is the easiest to set up and install?

Redis is fairly easy to setup. Grab the source and on a Linux box run make install. This yields redis-server binary that you could put it on your path and start it.

redis-server binds to localhost:6379 by default. Have a look at redis.conf that comes with the source for more configuration and setup options.

Which ones have bindings for Python and/or Ruby?

Redis has excellent Ruby and Python support. Ruby client is more feature complete ( for instance, it supports consistent hashing that the python client doesn't ) and popular ( redis-rb has 300+ watchers while redis-py has only 100+ ) compared to the python client.

[EDIT: response to xorlev's comment]

@xorlev: Thanks for the comment as I had missed something important to mention and you reminded me of it.

Memcached is just a simple key-value store. Redis holds everything in memory plus its also persistent. It writes back to disk in a non-blocking way. Redis also supports complex value types ( just look at slide number 2 ) like lists, sets and sorted sets and at the same time provides a simple interface to these value types.

There is also make 32bit that makes all pointers only 32-bits in size even on 64 bit machines. This saves considerable memory on machines with less than 4GB of RAM.

Redis is also very well documented and you can give it a try online.

Hope this helps.

ardsrk 2010-03-04 05:35:50

If you just want to hold everything in memory, I'd go with memcached.

Xorlev 2010-03-04 05:43:26

This is the best answer so far. Redis looks awesome.

Mike Trpcic 2010-03-04 16:53:52

@Mike: Salvatore, ( http://twitter.com/antirez ) the creator of Redis is very thoughtful and open about new ideas. Look at his tweet stream and you would know why redis is attracting more users and developers.

ardsrk 2010-03-04 17:15:10

You can actually just run "make", no need to "make install".

Don Spaulding 2010-03-04 17:44:27

@don: you are right. The other typically used option is `make noopt` to generate a binary without optimizations. This helps to run redis under gdb.

ardsrk 2010-03-04 17:56:17

There is also `make 32bit` that makes all pointers only 32-bits in size even on 64 bit machines. This saves considerable memory on machines with less than 4GB of RAM.

ardsrk 2010-03-05 10:39:32

Answer 3

+4 A:

I really like memcached personally.

I use it on a couple of my sites and it's simple, fast, and easy. It really was just incredibly simple to use, the API is easy to use. It doesn't store anything on disk, thus the name memcached, so it's out if you're looking for a persistent storage engine.

Python has python-memcached.

I haven't used the Ruby client, but a quick Google search reveals RMemCache

If you just need a caching engine, memcached is the way to go. It's developed, it's stable, and it's bleedin' fast. There's a reason LiveJournal made it and Facebook develops it. It's in use at some of the largest sites out there to great effect. It scales extremely well.

Xorlev 2010-03-04 05:50:14

in ruby the much easier is memcache-client. Integrate in Rails

shingara 2010-03-04 21:41:59

Answer 4

+4 A:

At this year's PyCon, Jeremy Edberg of Reddit gave a talk:

http://pycon.blip.tv/file/3257303/

He said that Reddit uses PostGres as a key-value store, presumably with a simple 2-column table; according to his talk it had benchmarked faster than any other key-value store they had tried. And, of course, it's very mature.

Ultimately, OverClocked is right; your use case determines the best store. But RDMBSs have long been (ab)used as key-value stores, and they can be very fast, too.

AdamKG 2010-03-04 16:00:23

I saw that talk when it was posted on Reddit, but couldn't find any solid examples of using Postgres as a KVS in the way Reddit does.

Mike Trpcic 2010-03-04 16:18:24

Start with `CREATE TABLE data (name varchar(40), value text);` and see how far you can go...

Kylotan 2010-03-06 22:19:06

Answer 5

+1 A:

Just to make the list complete: there's Dreamcache, too. It's compatible with Memcached (in terms of protocol, so you can use any client library written for Memcached), it's just faster.

grokk 2010-03-04 16:03:06

Answer 6

+5 A:

I've been playing with MongoDB and it has one thing that makes it perfect for my application, the ability to store complex Maps/Lists in the database directly. I have a large Map where each value is a list and I don't have to do anything special just to write and retrieve that without knowing all the different keys and list values. I don't know much about the other options but the speed and that ability make Mongo perfect for my application. Plus the Java driver is very simple to use.

MattGrommes 2010-03-04 16:24:34

Is there any posts comparing MongoDB to CouchDB? Couch also allows you to have complex maps/lists in the database (As JavaScript functions), and I'm wondering which is the faster/more stable of the two.

Mike Trpcic 2010-03-04 16:35:30

From a couple benchmarks that I have seen recently, Mongo is much, much faster than Couch, Mongo even beat out MySQL under certain conditions.

Redbeard 0x0A 2010-03-04 17:28:44

Reddit has had a couple of links about this: http://www.reddit.com/r/programming/comments/atnpb/what_are_the_merits_of_couchdb_over_mongodb_and/http://jayant7k.blogspot.com/2009/08/document-oriented-data-stores.html

MattGrommes 2010-03-04 17:52:40

@MattGrommes: Maybe you can clarify what you mean? Storing lists is trivial in any document database, not just MongoDB, no?

2010-10-04 04:15:26

Answer 7

+2 A:

There is also zodb.

mikerobi 2010-03-04 16:32:00

Answer 8

+4 A:

One distinction you have to make is what will you use the DB for? Don't jump on board just because it's trendy. Do you need a key value store? or do you need a document based store? What is your memory footprint requirement? running it on a small VM or a separate one?

I recommend listing your requirements first and then seeing which ones overlap with your requirements.

With that said, I have used CouchDB/MongoDB and prefer to use MongoDB for its ease of setup and best transition from mysql style queries. I chose mongodb over sql because of dynamic schemas(no migration files!) and better data modeling(arrays, hashes). I did not evaluate based on scalability.

MongoMapper is a great MongoDB orm mapper for Ruby and there's already a working Rails 3 fork.

I listed some more details about why I prefered mongodb in my scribd slides http://tommy.chheng.com/index.php/2010/02/mongodb-for-natural-development/

tommy chheng 2010-03-04 16:49:11

There are no requirements, I just want to LEARN, and getting opinions from people who have already learned is the best way to find a solid footing to make a jumping off point.

Mike Trpcic 2010-03-04 16:52:53

i think learning by trying is a good start. i would suggest thinking of a sample app, say a twitter app, and try modeling the data architecture and queries in each of the respective languages. you don't even need to code, just see what the queries are like for "followers of followers", etc. this will give you insight into how easy it will be to use.

tommy chheng 2010-03-04 17:44:48

Answer 9

+4 A:

I notice how everyone is confusing memcached with memcachedb. They are two different systems. The op asked about memcachedb.

memcached is memory storage. memcachedb uses Berkeley DB as its datastore.

drr 2010-03-05 03:29:35

This is true. I've had minor experience with memcached, but I'm looking to familiarize myself with memcacehedb or another KVS.

Mike Trpcic 2010-03-05 15:23:16

Your observation would be better as a comment.

2010-09-30 07:13:17

Answer 10

+2 A:

I only have experience with Berkeley DB, so I'll mention what I like about it.

It is fast
It is very mature and stable
It has outstanding documentation
It has C,C++,Java & C# bindings out of the box. Other language bindings are available. I believe Python comes with bindings as part of its "batteries".

The only downside I've run into is that the C# bindings are new and don't seem to support every feature.

Ferruccio 2010-03-13 11:44:32

+1 for BDB. It scales well, fast enough for 1MB per sec kind of txn and very robust.

Jack 2010-04-11 07:44:35

And it runs in memory...

Joel 2010-08-27 09:53:11

Answer 11

+5 A:

You need to understand what modern NoSQL phenomenon is about.
It is not about key-value storages. They've been available for decades (BerkeleyDB for example). Why all the fuss now ?
It is not about fancy document or object oriented schemas and overcoming "impedance mismatch". Proponents of these features have been touting them for years and they got nowhere.
It is simply about adressing 3 technical problems: automatic (for maintainers) and transparent (for application developers) failover, sharding and replication. Thus you should ignore any trendy products that do not deliver on this front. These include Redis, MongoDB, CouchDB etc. And concentrate on truly distributed solutions like cassandra, riak etc.

Otherwise you'll loose all the good stuff sql gives you (adhoc queries, Crystal Reports for your boss, third party tools and libraries) and get nothing in return.

Vagif Verdi 2010-04-11 06:22:26

As mentioned in my comment on OverClocked's post, BigCouch brings automatic sharding, failover and replication to the CouchDB world. I believe MongoDB has sharding and replication, as well.

2010-10-04 04:13:43

Answer 12

+1 A:

Cassandra seems to be popular.

Cassandra is in use at Digg, Facebook, Twitter, Reddit, Rackspace, Cloudkick, Cisco, SimpleGeo, Ooyala, OpenX, and more companies that have large, active data sets. The largest production cluster has over 100 TB of data in over 150 machines.

Justice 2010-04-11 08:01:31

Definitely a *lot* of momentum behind this project, but very severe design decisions may make it difficult to use for certain tasks. Unclear how this will play out in the long run, particularly as far as relevance/usability to/for smaller (i.e. non-worldwide-scale) users.

2010-10-04 04:19:26

ansaurus

tags:

views:

answers:

Which key value store is the most promising/stable?

related questions