Best way to save complex Python data structures across program sessions (pickle, json, xml, database, other)

views:

687

answers:

+4 Q:

Best way to save complex Python data structures across program sessions (pickle, json, xml, database, other)

Looking for advice on the best technique for saving complex Python data structures across program sessions.

Here's a list of techniques I've come up with so far:

pickle/cpickle
json
jsonpickle
xml
database (like SQLite)

Pickle is the easiest and fastest technique, but my understanding is that there is no guarantee that pickle output will work across various versions of Python 2.x/3.x or across 32 and 64 bit implementations of Python.

Json only works for simple data structures. Jsonpickle seems to correct this AND seems to be written to work across different versions of Python.

Serializing to XML or to a database is possible, but represents extra effort since we would have to do the serialization ourselves manually.

Thank you, Malcolm

+9 A:

You have a misconception about pickles: they are guaranteed to work across Python versions. You simply have to choose a protocol version that is supported by all the Python versions you care about.

The technique you left out is marshal, which is not guaranteed to work across Python versions (and btw, is how .pyc files are written).

Ned Batchelder 2010-01-05 01:59:19

+1... useful info!

jldupont 2010-01-05 02:00:47

Ned: Thank you for pointing out my confusion between pickling and marshalling.

Malcolm 2010-01-05 03:01:44

+2 A:

You left out the marshal and shelve modules.

Also this python docs page covers persistence

SpliFF 2010-01-05 02:01:34

SpliFF: Thanks for the link to the Python Persistence web page.

Malcolm 2010-01-05 03:02:29

+2 A:

Have you looked at PySyck or pyYAML?

rnicholson 2010-01-05 02:03:03

Micholson: I had forgotten about pyYAML. Looks like an interesting compromise between JSON (doesn't work with complex data structures) and pickle. Have you looked at the jsonpickle project. Very impressive as well.

Malcolm 2010-01-05 03:04:06

No jsonpickle was new to me. Thanks for the pointer!

rnicholson 2010-01-05 03:05:15

What are your criteria for "best" ?

pickle can do most Python structures, deeply nested ones too
sqlite dbs can be easily queried (if you know sql :)
speed / memory ? trust no benchmarks that you haven't faked yourself.

(Fine print:
cPickle.dump(protocol=-1) compresses, in one case 15M pickle / 60M sqlite, but can break.
Strings that occur many times, e.g. country names, may take more memory than you expect; see the builtin intern().
)

Denis 2010-02-01 13:22:03

Denis: Thanks for your warning about protocol=-1 and sense of humor (re: trust no benchmarks you haven't faked yourself ... LMAO!)

Malcolm 2010-02-02 15:58:33

+1 A:

I would consult the internal documentation for yserial module which addresses your possible solutions and offers a ready-to-go implementation: y_serial.py module :: warehouse Python objects with SQLite

"Serialization + persistance :: in a few lines of code, compress and annotate Python objects into SQLite; then later retrieve them chronologically by keywords without any SQL. Most useful "standard" module for a database to store schema-less data."

http://yserial.sourceforge.net

There's a comparative review of pickle versus other methods, especially on speed and security issues. @Denis, note that compression is seamlessly integrated into yserial which is nice because that dramatically reduces the size of BLOBs.

code43 2010-05-01 18:39:54

ansaurus

tags:

views:

answers:

Best way to save complex Python data structures across program sessions (pickle, json, xml, database, other)

related questions