views:

400

answers:

1

Hi there,

I'm gathering some statistics from a web service, and storing it in a collection. The data looks similar to this (but with more fields):

{"downloads": 30, "dt": "2010-02-17T16:56:34.163000"}
{"downloads": 30, "dt": "2010-02-17T17:56:34.163000"}
{"downloads": 30, "dt": "2010-02-17T18:56:34.163000"}
{"downloads": 30, "dt": "2010-02-17T19:56:34.163000"}
{"downloads": 30, "dt": "2010-02-17T20:56:34.163000"}
{…}
{"downloads": 30, "dt": "2010-02-18T17:56:34.163000"}
{"downloads": 30, "dt": "2010-02-18T18:56:34.163000"}
{"downloads": 30, "dt": "2010-02-18T19:56:34.163000"}
{"downloads": 30, "dt": "2010-02-18T20:56:34.163000"}

If someone requests the daily numbers for the last thirty days, that would mean the max amount of (in this example) 'downloads' pr. day. Which is the last record of the day.

By using collection.find({"dt": {"$gt": datetime_obj_30_days_ago}}), I of course get all the rows, which is not very suitable. So I'm looking for a way to only return the last of the day for the given period.

I was told that group() might be the way to go, but I can't quite understand how to get it working in this instance.

Any tips, pointers would be very appreciated!

+1  A: 

You can do this using group. In your example you'd need to supply a javascript function to compute the key (as well the reduce function), because you want only the date component of the datetime field. This should work:

db.coll.group(
    key='function(doc) { return {"dt": doc.dt.toDateString()} }',
    condition={'dt': {'$gt': datetime_obj_30_days_ago}},
    initial={'downloads': 0},
    reduce='function(curr, prev) { prev.downloads = Math.max(curr.downloads, prev.downloads) }'
)

Keep in mind that still does a linear scan of the past month, just on the server instead of the client. It's possible that simply selecting the max value of each day individually is faster.

Coady
Thanks a bunch, Coady – you just widened my understanding of `group`. :-)
Henrik Lied