views:

133

answers:

2

Hi all,

I'm building a directory for finding athletic tournaments on GAE with web2py and a Flex front end. The user selects a location, a radius, and a maximum date from a set of choices. I have a basic version of this query implemented, but it's inefficient and slow. One way I know I can improve it is by condensing the many individual queries I'm using to assemble the objects into bulk queries. I just learned that was possible. But I'm also thinking about a more extensive redesign that utilizes memcache.

The main problem is that I can't query the datastore by location because GAE won't allow multiple numerical comparison statements (<,<=,>=,>) in one query. I'm already using one for date, and I'd need TWO to check both latitude and longitude, so it's a no go. Currently, my algorithm looks like this:

1.) Query by date and select

2.) Use destination function from geopy's distance module to find the max and min latitude and longitudes for supplied distance

3.) Loop through results and remove all with lat/lng outside max/min

4.) Loop through again and use distance function to check exact distance, because step 2 will include some areas outside the radius. Remove results outside supplied distance (is this 2/3/4 combination inefficent?)

5.) Assemble many-to-many lists and attach to objects (this is where I need to switch to bulk operations)

6.) Return to client

Here's my plan for using memcache.. let me know if I'm way out in left field on this as I have no prior experience with memcache or server caching in general.

-Keep a list in the cache filled with "geo objects" that represent all my data. These have five properties: latitude, longitude, event_id, event_type (in anticipation of expanding beyond tournaments), and start_date. This list will be sorted by date.

-Also keep a dict of pointers in the cache which represent the start and end indices in the cache for all the date ranges my app uses (next week, 2 weeks, month, 3 months, 6 months, year, 2 years).

-Have a scheduled task that updates the pointers daily at 12am.

-Add new inserts to the cache as well as the datastore; update pointers.

Using this design, the algorithm would now look like:

1.) Use pointers to slice off appropriate chunk of list based on supplied date.

2-4.) Same as above algorithm, except with geo objects

5.) Use bulk operation to select full tournaments using remaining geo objects' event_ids

6.) Assemble many-to-manys

7.) Return to client

Thoughts on this approach? Many thanks for reading and any advice you can give.

-Dane

+1  A: 

GeoModel is the best I found. You may look how my GAE app return geospatial queries. For instance India http query is with optional cc (country code) using geomodel library lat=20.2095231&lon=79.560344&cc=IN

LarsOn
Thank you LarsOn. I will look into GeoModel for guidance as well.
Dane
+1  A: 

You might be interested by geohash, which enables you to do an inequality query like this:

SELECT latitude, longitude, title FROM myMarkers WHERE geohash >= :sw_geohash AND geohash <= :ne_geohash

Have a look at this fine article which was featured in this month's Google App Engine App Engine Community Update blog post.

As a note on your proposed design, don't forget that entities in Memcache have no guarantee of staying in memory, and that you can not have them "sorted by date".

Emilien
Great, I will check out geohash. Looks like just what I need. Also good to know that about memcache. Any idea why they can't be sorted? I was hoping that memcache entities would be able to hold basic data structures like lists, which could be sorted, but guess it's not that simple.
Dane
About memcache[1]: "In general, an application should not expect a cached value to always be available."-- If you store a pickle'd version of your list, then the content of what's stored in this one memcache instance is sorted. But you can not "sort" various memcache entities, as you would do with a datastore query. However, you can give your memcache entitites names that will help you to retrieve them. Don't know if that can help you in your plans.[1]: http://code.google.com/intl/fr/appengine/docs/python/memcache/overview.html#How_Cached_Data_Expires
Emilien