views:

1086

answers:

3

I use the following data:

date                     latitude         route      name longitude
2009-04-11 00:50:31.640000  40.80708 White Loop 86 -77.85891
2009-04-11 00:50:27.718000  40.80708 White Loop 86 -77.85891
2009-04-11 00:50:01.562000  40.80708 White Loop 86 -77.85891
2009-04-11 00:49:48.765000  40.80708 White Loop 86 -77.85891
2009-04-11 00:49:34.796000  40.802338 White Loop 86 -77.85073
2009-04-11 00:49:22.468000  40.802338 White Loop 86 -77.85073
2009-04-11 00:48:35.671000  40.802338 White Loop 86 -77.85073
2009-04-11 00:48:29.125000  40.802338 White Loop 86 -77.85073
2009-04-11 00:47:19.906000  40.79889 White Loop 86 -77.85299
2009-04-11 00:47:03.609000  40.79889 White Loop 86 -77.85299
2009-04-11 00:46:54.437000  40.79889 White Loop 86 -77.85299
2009-04-11 00:46:52.687000  40.79889 White Loop 86 -77.85299
2009-04-11 00:46:51.125000  40.79889 White Loop 86 -77.85299
2009-04-11 00:46:48.578000  40.79889 White Loop 86 -77.85299
2009-04-11 00:46:41.406000  40.79889 White Loop 86 -77.85299
2009-04-11 00:50:31.687000  40.792194 White Loop 82 -77.863235
2009-04-11 00:50:27.781000  40.792194 White Loop 82 -77.863235
2009-04-11 00:50:01.640000  40.792194 White Loop 82 -77.863235
2009-04-11 00:49:48.812000  40.792194 White Loop 82 -77.863235
2009-04-11 00:49:34.843000  40.794914 White Loop 82 -77.866844
2009-04-11 00:49:22.531000  40.794914 White Loop 82 -77.866844
2009-04-11 00:48:35.718000  40.794914 White Loop 82 -77.866844
2009-04-11 00:48:29.156000  40.79738 White Loop 82 -77.86755
2009-04-11 00:47:19.984000  40.79738 White Loop 82 -77.86755
2009-04-11 00:47:03.656000  40.79738 White Loop 82 -77.86755
2009-04-11 00:46:54.484000  40.79738 White Loop 82 -77.86755
2009-04-11 00:46:52.734000  40.79738 White Loop 82 -77.86755
2009-04-11 00:46:51.156000  40.79738 White Loop 82 -77.86755
2009-04-11 00:46:48.640000  40.79738 White Loop 82 -77.86755
2009-04-11 00:46:41.453000  40.79738 White Loop 82 -77.86755
2009-04-11 00:50:31.656000  40.776066 White Loop 81 -77.88552
2009-04-11 00:50:27.750000  40.776066 White Loop 81 -77.88552
2009-04-11 00:50:01.593000  40.776066 White Loop 81 -77.88552
2009-04-11 00:49:48.796000  40.776066 White Loop 81 -77.88552
2009-04-11 00:49:34.812000  40.764687 White Loop 81 -77.88271
2009-04-11 00:49:22.515000  40.764687 White Loop 81 -77.88271
2009-04-11 00:48:35.703000  40.764687 White Loop 81 -77.88271
2009-04-11 00:48:29.140000  40.764687 White Loop 81 -77.88271
2009-04-11 00:47:19.937000  40.76335 White Loop 81 -77.876755
2009-04-11 00:47:03.640000  40.76335 White Loop 81 -77.876755
2009-04-11 00:46:54.468000  40.76335 White Loop 81 -77.876755
2009-04-11 00:46:52.718000  40.76335 White Loop 81 -77.876755
2009-04-11 00:46:51.156000  40.76335 White Loop 81 -77.876755
2009-04-11 00:46:48.609000  40.76335 White Loop 81 -77.876755
2009-04-11 00:46:41.437000  40.76335 White Loop 81 -77.876755

How can I refine the query to get only the most recent rows for each "name"? For example, I only want to end up with:

2009-04-11 00:50:31.640000  40.80708 White Loop 86 -77.85891
2009-04-11 00:50:31.687000  40.792194 White Loop 82 -77.863235
2009-04-11 00:50:31.656000  40.776066 White Loop 81 -77.88552

And I want all results to have date values that are no more than 1 minute old. Keep in mind that the date values are Python datetime properties.

Thanks

+1  A: 

In SQL you could do all sorts of fancy things, but Google API is rather limited.

Given that you want all records to be no more than 1 minute old, I'd just ask the database for all records less than 1 minute old, and then have python collate the results and reject the duplicate rows.

From the data you show here, it looks like you're getting a couple of rows per 'name' per minute or so, so that approach should be sufficient even though its inelegant.

The alternative would be to keep a second table with only the latest entry for each 'name' in it ... and cull that table every now and then to remove records over a minute old.

Sharkey
+1  A: 

I think I figured out a decent solution. The problem was in my model:

date = db.DateTimeProperty(auto_now_add=True)

This meant that for every instance of that model, the datetimes would all be slightly different. This makes grouping my data very difficult. So in my cron function, I made sure that every api request had the exact same time stamp.

The next change was to create a Current table. Every time the cron runs, it deletes everything in the Current table (only one row), and adds a new row. This new row is then added to a Log table which semi-permanently stores the results.

+1  A: 

Surely this would work:

query = db.GqlQuery("SELECT * FROM [table] ORDER BY date DESC LIMIT BY [num of rows]")

Alternatively, you could use an inequality, like "date > 2009-04-11 00:50", which would return all the results after that time.

pyguy