views:

96

answers:

6

Hey,

I have to check if a dictionary is the same as it was yesterday, if it has changed.

In PHP I could have serialized an array and compared the resulting strings from yesterday and today. However, I don't know how to do it in Py. I've read a little about Pickle and maybe it could be done with md5 somehow?

So basically I need a way to dismantle a dict into a comparable, storable value that is not a file and can be hardcoded into .py file.

Thanks, A.R.

A: 

pickle is definitely what you're looking for.

Amber
A: 
pickle.dumps(set(yourdict.items()))
Javier
Why convert it to a set first?
Nick Johnson
after testing it, i found not to be strictly needed, since dicts are comparable, and pickle seems to enforce key order. still, a different implementation could in theory create different serializations for identical dictionaries. in contrast, comparatibelity is a set purpose in life, so i feel it's 'safer'
Javier
A: 

By using pickle you could create a new DictProperty allowing you to put the dict in the datastore and retrieve it later for comparison.

Here is one implementation: http://stackoverflow.com/questions/1953784/can-i-store-a-python-dictionary-in-googles-bigtable-datastore-without-serializin/1953956#1953956

Franck
+2  A: 

The problem with dictionaries is their undefined order. You must make sure you always get the same result of equal dictionaries (if you want to compare them as strings).

You could do it in multiple ways:

1) Python hash (only for checking equality; hash implementation might be specific to the Python version!)

print hash(str(sorted({1 : 2, 3 : 4}.items())))

2) MD5 (best if you only want to check equality)

import hashlib
print hashlib.md5(str(sorted({1 : 2, 3 : 4}.items()))).hexdigest()

3) Pickling (serialization)

import pickle
serializedString = pickle.dumps({1 : 2, 3 : 4})

The pickle module has different protocols (and I think it doesn't sort the dictionary items), so you can't do string comparison. You have to unpickle the data to a dictionary and then compare the old and new dictionary directly (d = pickle.loads(serializedString)).

4) Item tuple representation (serialization)

According to your comment, you want something embeddable into Python source code. As S.Lott suggested, you can use the object representation of someDictionary.items(), which is a list containing all (key, value) combinations as tuples (most probably unsorted):

>>> repr({1 : 2, 3 : 4}.items())
'[(1, 2), (3, 4)]'

You can copy-and-paste this representation into your source code if you want the object serialized as a string .

AndiDog
WHat's wrong with (#4) create a tuple of `dictionary.items()`?
S.Lott
@S.Lott: The question reads "storable value", so I thought the OP wants the dict to be serialized as a string. Creating a sorted tuple of `d.items()` would still have to be serialized - but that's hidden in my option #1.
AndiDog
@AndiDog: Hashes are complementary to pickling. You can do both, independently. Indeed, people often do both. They serialize, and then hash the serialized result to simplify change detection. You only provide one serialization technique and two hashing techniques. Why not add other serialization techniques?
S.Lott
@S.Lott: JSON was already named (although it's not in the standard library), and `str(d.items())` is in my answer. Which serialization technique would you like me to add?
AndiDog
@AndiDog: I'll repeat my earlier comment: `dictionary.items()`
S.Lott
@S.Lott: The `items()` tuple is not serialized because it's still an object. For serializing to a file, for example, it must be converted to storable bytes, and therefore I mentioned `str()` and pickling. Or did I get you wrong here?
AndiDog
@AndiDog: From a comment: "Storable, as in hard-codable into the script file, as a string".
S.Lott
@S.Loot: Oh, hadn't seen that comment. I'll add it.
AndiDog
+1  A: 

You could use JSON to get what you are asking for.

from django.utils import simplejson as json

dict_1 = {'a': 1, 'b': 2, 'c': 3}
dict_2 = {'a': 2, 'b': 7, 'd': 9}

dict_1_str = json.dumps(dict_1, sort_keys=True)
dict_2_str = json.dumps(dict_2, sort_keys=True)

if dict_1_str == dict_2_str:
   # do something with the new dict...
   pass

The dict_X_str variables will contain a serialized version of the dict. You can store it in memcahce or the datastore for later comparison.

Robert Kluin
A: 

If you do this, you get an object which is consistent and comparable. It has a well-defined and predictable order. You can use difflib to find differences. Further, you can trivially rebuild the high-performance dictionary from it.

static = list(sorted(some_dict.items()))

a_dict= dict( static )
S.Lott