views:

163

answers:

2

If I have a pair of floats, is it any more efficient (computationally or storage-wise) to store them as a GeoPtProperty than it would be pickle the tuple and store it as a BlobProperty?

If GeoPt is doing something more clever to keep multiple values in a single property, can it be leveraged for arbitrary data? Can I store the tuple ("Johnny", 5) in a single entity property in a similarly efficient manner?

A: 

GeoPt itself is limited to (-90 - 90, -180 - 180); it can't be used to store any data that won't fit this model.

However, a custom tuple property shouldn't be too difficult to create yourself; take a look at how SetProperty and ArrayProperty are designed in aetycoon.

Wooble
+3  A: 

Here are some empirical answers:

GeoPtProperty uses 31B of storage space.

Using BlobProperty varies based on what exactly you store:

  • struct.pack('>2f', lat, lon) => 21B.
  • Using pickle (v2) to packe a 2-tuple containing floats => 37B.
  • Using pickle (v0) to packe a 2-tuple containing floats => about 30B-32B (v0 uses a variable-length ascii encoding for floats).

In short, it doesn't look like GeoPt is doing anything particularly clever. If you are going to be storing a lot of these, then you could use struct to pack your floats. Packing and unpacking them with struct will probably be unnoticeably different from the CPU cost associated with serializing/deserializing GeoPt.

If you plan on storing lots of floats per entity and space is really important, then you might consider leveraging the CompressedBlobProperty in aetycoon.

Disclaimer: This is the minimum space required. Actual space will be slightly larger per property based on the length of the property's name. The model itself also adds overhead (for its name and key).

David Underhill
For fun ... `GeoPt` seems to be stored as a "point" value. Specifically, the data itself seems to take 18B - two 1-byte integers and two 8-byte doubles (see [here](http://code.google.com/p/googleappengine/source/browse/trunk/python/google/appengine/datastore/entity_pb.py#215)) for a total of 18 bytes. The `struct` solution packs each float into 4 bytes for a total of 8-bytes. Both therefore have 13B of "overhead" from the rest of the property info from my tests above (the overhead is expected to be the same since the only thing differing between my tests is the property type).
David Underhill
Your analysis doesn't take into account CPU time - pickling and depickling is typically very computationally expensive. Using a struct is a good idea, though. The newly added ArrayProperty in AETycoon is also worth a look.
Nick Johnson