views:

721

answers:

4

So I'm looking at various key:value (where value is either strictly a single value or possibly an object) stores for use with Python, and have found a few promising ones. I have no specific requirement as of yet because I am in the evaluation phase. I'm looking for what's good, what's bad, what are the corner cases these things handle well or don't, etc. I'm sure some of you have already tried them out so I'd love to hear your findings/problems/etc. on the various key:value stores with Python. I'm looking primarily at:

memcached - http://www.danga.com/memcached/ python clients: http://pypi.python.org/pypi/python-memcached/1.40 http://www.tummy.com/Community/software/python-memcached/

CouchDB - http://couchdb.apache.org/ python clients: http://code.google.com/p/couchdb-python/

Tokyo Tyrant - http://1978th.net/tokyotyrant/ python clients: http://code.google.com/p/pytyrant/

Lightcloud - http://opensource.plurk.com/LightCloud/ Based on Tokyo Tyrant, written in Python

Redis - http://code.google.com/p/redis/ python clients: http://pypi.python.org/pypi/txredis/0.1.1

MemcacheDB - http://memcachedb.org/

So I started benchmarking (simply inserting keys and reading them) using a simple count to generate numeric keys and a value of "A short string of text":

memcached: CentOS 5.3/python-2.4.3-24.el5_3.6, libevent 1.4.12-stable, memcached 1.4.2 with default settings, 1 gig memory, 14,000 inserts per second, 16,000 seconds to read. No real optimization, nice.

memcachedb claims on the order of 17,000 to 23,000 inserts per second, 44,000 to 64,000 reads per second.

I'm also wondering how the others stack up speed wise.

A: 

How about Amazon's SimpleDB?

There is an open-source python library called boto for python interfacing Amazon web services.

DanJ
Efficiency (latency mostly, remote = slow) and cost, I'd rather run it locally.
Kurt
+3  A: 

That mostly depends on your need.

Read Caveats of Evaluating Databases to understand how to evaluate them.

Anand Chitipothu
One of the best pages I've seen so far.
Kurt
+3  A: 

shelve (storing dictonaris in file / standard python module)

ZODB - persisatnce object database (python objects database, no SQL)

More persistance tools: http://wiki.python.org/moin/PersistenceTools

ralu
Nice, I wouldn't have thought to use "persistence" as a keyword but it makes sense.
Kurt
I also like this aproach better. This way you have database which is native to Python language. What you need in program is something like this (pseudocode warning) a=load_database(database) newtable={} newtable['key']='value' a['new_table']=newtable a.save
ralu
+1  A: 

My 5 cents:

Do you need distributed systems with tera byte sized data or massive write performance?

Well, they you need one of the big key:value/BigTable/Dynamo type things. That would by Cassandra, Tokyo Tyrant, Redis, etc. You need to make sure that the client library supports sharding so you can have multiple databases to write to. Which one to use here can only be decided by you after testing with data that looks like what you think you need.

Do you need the data to be accessible from other systems/languages than Python?

Since these databases have no structure to their data at all, if it's accessible from other languages/clients that yours depends on what you store in it. But if you need this CouchDB is a good choice, as it stores it's data a JSON documents, so you get interoperability. How good CouchDB is on really massive data and sharding is unclear though.

Do you need neither interoperability with other languages than Python or distributed multi-server storage?

Use ZODB.

Lennart Regebro
Best part: use ZODB. If you need to scale it, there's a server for that (see ZEO).
Troy J. Farrell