ansaurus

Question

What is the least resource intense data structure to distribute with a Python Application

Answer 1

A:

Here are three things you can try:

Compress the pickled dictionary with zlib. pickle.dumps(dict).encode("zlib")
Make your own serializing format (shouldn't be too hard).
Load the data in a sqlite database.

Unknown 2009-05-20 00:11:09

Answer 2

+2 A:

To get fast lookups, use the standard Python dbm module (see http://docs.python.org/library/dbm.html) to build your database file, and do lookups in it. The dbm file format may not be cross-platform, so you may want to to distrubute your data in Pickle or repr or JSON or YAML or XML format, and build the dbm database the user runs your program.

pts 2009-05-20 00:19:02

Answer 3

+6 A:

The standard shelve module will give you a persistent dictionary that is stored in a dbm style database. Providing that your keys are strings and your values are picklable (since you're using pickle already, this must be true), this could be a better solution that simply storing the entire dictionary in a single pickle.

Example:

>>> import shelve
>>> d = shelve.open('mydb')
>>> d['key1'] = 12345
>>> d['key2'] = value2
>>> print d['key1']
12345
>>> d.close()

I'd also recommend Durus, but that requires some extra learning on your part. It'll let you create a PersistentDictionary. From memory, keys can be any pickleable object.

mhawke 2009-05-20 00:33:29

FYI: http://docs.python.org/library/shelve.html

2009-05-20 00:44:33

Answer 4

+2 A:

How much memory can your application reasonably use? Is this going to be running on each user's desktop, or will there just be one deployment somewhere?

A python dictionary in memory can certainly cope with two million keys. You say that you've got a subset of the data; do you have the whole lot? Maybe you should throw the full dataset at it and see whether it copes.

I just tested creating a two million record dictionary; the total memory usage for the process came in at about 200MB. If speed is your primary concern and you've got the RAM to spare, you're probably not going to do better than an in-memory python dictionary.

John Fouhy 2009-05-20 01:46:25

Answer 5

+1 A:

See this solution at SourceForge, esp. the "endnotes" documentation:

y_serial.py module :: warehouse Python objects with SQLite

"Serialization + persistance :: in a few lines of code, compress and annotate Python objects into SQLite; then later retrieve them chronologically by keywords without any SQL. Most useful "standard" module for a database to store schema-less data."

http://yserial.sourceforge.net

code43 2009-09-13 04:52:36

ansaurus

tags:

views:

answers:

What is the least resource intense data structure to distribute with a Python Application

related questions