views:

117

answers:

3

Hi!

I have to deserialize a dictionary in PHP that was serialized using cPickle in Python.

In this specific case I probably could just regexp the wanted information, but is there a better way? Any extensions for PHP that would allow me to deserialize more natively the whole dictionary?

Apparently it is serialized in Python like this:

import cPickle as pickle

data = { 'user_id' : 5 }
pickled = pickle.dumps(data)
print pickled

Contents of such serialization cannot be pasted easily to here, because it contains binary data.


Solution

Since the Python end is Django, I ended up creating own JSON SessionStore.

+6  A: 

If you want to share data objects between programs written in different languages, it might be easier to serialize/deserialize using something like JSON instead. Most major programming languages have a JSON library.

mipadi
Python 2.6+ has it built-in, and there's simplejson for earlier versions.
Ignacio Vazquez-Abrams
Though a good idea, the serialization part is not under my control.
Ciantic
I first thought I don't want to hack the Django app, but then again it might be faster solution. So here is my simple [JSON SessionStore for Django](http://gist.github.com/441132)
Ciantic
+2  A: 

Can you do a system call? You could use a python script like this to convert the pickle data into json:

# pickle2json.py
import sys, optparse, cPickle, os
try:
    import json
except:
    import simplejson as json

# Setup the arguments this script can accept from the command line
parser = optparse.OptionParser()
parser.add_option('-p','--pickled_data_path',dest="pickled_data_path",type="string",help="Path to the file containing pickled data.")
parser.add_option('-j','--json_data_path',dest="json_data_path",type="string",help="Path to where the json data should be saved.")
opts,args=parser.parse_args()

# Load in the pickled data from either a file or the standard input stream
if opts.pickled_data_path:
    unpickled_data = cPickle.loads(open(opts.pickled_data_path).read())
else:
    unpickled_data = cPickle.loads(sys.stdin.read())

# Output the json version of the data either to another file or to the standard output
if opts.json_data_path:
    open(opts.json_data_path, 'w').write(json.dumps(unpickled_data))
else:
    print unpickled_data

This way, if your getting the data from a file you could do something like this:

<?php
    exec("python pickle2json.py -p pickled_data.txt", $json_data = array());
?>

or if you want to save it out to a file this:

<?php
    system("python pickle2json.py -p pickled_data.txt -j p_to_j.json");
?>

All the code above probably isn't perfect (I'm not a PHP developer), but would something like this work for you?

Eric Palakovich Carr
A: 

If the pickle is being created by the the code that you showed, then it won't contain binary data -- unless you are calling newlines "binary data". See the Python docs. Following code was run by Python 2.6.

>>> import cPickle
>>> data = {'user_id': 5}
>>> for protocol in (0, 1, 2): # protocol 0 is the default
...     print protocol, repr(cPickle.dumps(data, protocol))
...
0 "(dp1\nS'user_id'\np2\nI5\ns."
1 '}q\x01U\x07user_idq\x02K\x05s.'
2 '\x80\x02}q\x01U\x07user_idq\x02K\x05s.'
>>>

Which of the above looks most like what you are seeing? Can you post the pickled file contents as displayed by a hex editor/dumper or whatever is the PHP equivalent of Python's repr()? How many items in a typical dictionary? What data types other than "integer" and "string of 8-bit bytes" (what encoding?)?

John Machin