Pure Python persistent key and value based container (a hash like interface) with large file system support?

views:

341

answers:

+1 Q:

Pure Python persistent key and value based container (a hash like interface) with large file system support?

I am looking for a (possibly) pure Python library for persistent hash table (btree or b+tree which would provide following features

Large file support (possibly in terabytes)
Fast enough and low memory footprint (looking for a descent balance between speed and memory)
Low cost of management
Reliability i.e. doesn't corrupt file once the content is written through the file system
Lastly a pure Python implementation. I am OK if it has C library but I am looking for a cross platform solution

I have looked into solutions like redis, shelve, tokyo cabinet. Tokyo cabinet is impressive and has a Python binding in the making at http://code.google.com/p/python-tokyocabinet/, but its Windows port is a work in progress.

Thanks for some good suggestions. I am currently exploring SQLite3 with Python. I got suggestions to use database engine but am more inclined towards a lean and mean persistent b+tree implementations

+2 A:

ZODB
http://pypi.python.org/pypi/ZODB3

Like Lennart says, use the latest version of course

gnibbler 2009-10-09 11:13:37

3.2 is like five years old or so. I took the liberty to link to the latest versions.

Lennart Regebro 2009-10-09 11:24:46

+1 A:

ZODB is indeed a powerful tool, but maybe it's overkill.

You can hack your own solution in few Python lines : simply code a dictionary like object as a data base adapter. Try using this snippets, replacing the SQLite call to MySql and you should be done.

e-satis 2009-10-09 11:27:21

Thanks for your suggestions. I am currently exploring sqlite3 with pyhton. Do you know if there are any any limits associated with sqlite3 with regards to # of rows in a table other than the disk space?

volatilevoid 2009-10-10 03:22:22

You will be interested in the review of all SQLite implementation limits : http://www.evolane.com/support/manuals/shared/manuals/tcltk/sqlite/limits.html

e-satis 2009-10-10 11:52:26

Not sure MySQL is the database of choice if reliability is one of the key requirements, particularly if MyISAM is in use.

Charles Duffy 2009-10-11 09:42:12

Well, this snippet can adapt to any DB since Python got the same API to access them all. Anyway, reliability can have a lot of different meanings these days. Generally, MySQL is "reliable" enough if you are not coding, says, a bank system

e-satis 2009-10-11 11:03:47

+1 A:

Use a relational database.

Really fast when retrieving data based on a key, if you put an index in the key.
Good scaling
Don't get easily corrupted
Tools already available for:
- Backups
- Replication
- Clustering
Cross-platform
Works over the network
Allow really fast JOINs, grouping, agreggation, and other complex queries, in case you need them

You can easily create a class that works like a dict or hash table, but uses the database as storage. You can make it cache as much as you want on memory.

nosklo 2009-10-09 12:09:47

ansaurus

tags:

views:

answers:

Pure Python persistent key and value based container (a hash like interface) with large file system support?

related questions