views:

341

answers:

3

I am looking for a (possibly) pure Python library for persistent hash table (btree or b+tree which would provide following features

  1. Large file support (possibly in terabytes)
  2. Fast enough and low memory footprint (looking for a descent balance between speed and memory)
  3. Low cost of management
  4. Reliability i.e. doesn't corrupt file once the content is written through the file system
  5. Lastly a pure Python implementation. I am OK if it has C library but I am looking for a cross platform solution

I have looked into solutions like redis, shelve, tokyo cabinet. Tokyo cabinet is impressive and has a Python binding in the making at http://code.google.com/p/python-tokyocabinet/, but its Windows port is a work in progress.

Thanks for some good suggestions. I am currently exploring SQLite3 with Python. I got suggestions to use database engine but am more inclined towards a lean and mean persistent b+tree implementations

+2  A: 

ZODB
http://pypi.python.org/pypi/ZODB3

Like Lennart says, use the latest version of course

gnibbler
3.2 is like five years old or so. I took the liberty to link to the latest versions.
Lennart Regebro
+1  A: 

ZODB is indeed a powerful tool, but maybe it's overkill.

You can hack your own solution in few Python lines : simply code a dictionary like object as a data base adapter. Try using this snippets, replacing the SQLite call to MySql and you should be done.

e-satis
Thanks for your suggestions. I am currently exploring sqlite3 with pyhton. Do you know if there are any any limits associated with sqlite3 with regards to # of rows in a table other than the disk space?
volatilevoid
You will be interested in the review of all SQLite implementation limits : http://www.evolane.com/support/manuals/shared/manuals/tcltk/sqlite/limits.html
e-satis
Not sure MySQL is the database of choice if reliability is one of the key requirements, particularly if MyISAM is in use.
Charles Duffy
Well, this snippet can adapt to any DB since Python got the same API to access them all. Anyway, reliability can have a lot of different meanings these days. Generally, MySQL is "reliable" enough if you are not coding, says, a bank system
e-satis
+1  A: 

Use a relational database.

  • Really fast when retrieving data based on a key, if you put an index in the key.
  • Good scaling
  • Don't get easily corrupted
  • Tools already available for:
    • Backups
    • Replication
    • Clustering
  • Cross-platform
  • Works over the network
  • Allow really fast JOINs, grouping, agreggation, and other complex queries, in case you need them

You can easily create a class that works like a dict or hash table, but uses the database as storage. You can make it cache as much as you want on memory.

nosklo