views:

59

answers:

1

Hello, I need to compute the sizes of some python objects so I can break them up and store them in memcache without hitting size limits.

'sizeof()' doesn't seem to be present on python objects in the GAE environment and sys.getsizeof() is also unavailable.

GAE itself is clearly checking sizes behind the scenes to enforce the limits. Any ideas for how to accomplish this? Thanks.

+3  A: 

memcache internally and invariably uses pickle and stores the resulting string, so you can check with len(pickle.dumps(yourobject, -1)). Note that sys.getsizeof (which requires 2.6 or better, which is why it's missing on GAE) would not really help you at all:

>>> import sys
>>> sys.getsizeof(23)
12
>>> import pickle
>>> len(pickle.dumps(23, -1))
5

since the size of a serialized pickle of the object can be quite different from the size of the object in memory, as you can see (so I guess you should feel grateful to GAE for not offering sizeof, which would have led you astray;-).

Alex Martelli
Thanks Alex for the great answer and explanation. Makes a lot of sense.
Dane
On a related note, is it secure to use pickle on user-supplied data?
Dane
@Dane, `pickle.dump` (and `.dumps`) are secure, as is `load` -ing back the resulting file or string. The only potentially insecure part is `load`ing a user-supplied _string of bytes_, i.e., a string (whether taken from a file, or not) that you haven't generated yourself with `pickle` in the first place (and `memcache` doesn't do that;-).
Alex Martelli
Nice, good to know. One more question: what protocol is the -1 arg for pickle.dumps?
Dane
@Dane, -1 means "the best available (no compatibility constraints with older versions of Python)" -- clearly memcache need not stay compatible w/older versions, so it can use the "latest and greatest" in whatever version it's running on (saving space and/or time, and possibly allowing the serialization of some object types than older protocols did not support).
Alex Martelli
It's worth pointing out that pickling is an expensive operation, and doing so just to determine the length is probably a waste. You'd be better off just optimistically storing to memcache, and catch the exception you get if it's too big.
Nick Johnson
Ah, good point Nick.
Dane