views:

51

answers:

2

I have a MongoDB collection which has a created_at stored in each document. These are stored as a MongoDB date object e.g.

{ "_id" : "4cacda7eed607e095201df00", "created_at" : "Wed Oct 06 2010 21:22:23 GMT+0100 (BST)", text: "something" }
{ "_id" : "4cacdf31ed607e0952031b70", "created_at" : "Wed Oct 06 2010 21:23:42 GMT+0100     (BST)", text: "something" }
....

I would like to count the number of items created between each minute, so I can pass the data into Google Charts to generate something like this:

alt text How do I do this with a map reduce function, or is there a fancy mongodb aggregate function which I could use instead?

+3  A: 

Map should emit a timestamp object, adjusted up to the minute, and a count of 1. The reduce should sum all the counts.

map = function() {

var created_at_minute = new Date(this.created_at.getFullYear(),
                                 this.created_at.getMonth(), 
                                 this.created_at.getDate(), 
                                 this.created_at.getHours(), 
                                 this.created_at.getMinutes());
    emit(created_at_minute, {count: 1});
}

reduce = function(key, values) { 
         var total = 0;
         for(var i = 0; i < values.length; i++) { total += values[i].count; }
         return {count: total};
}
rubayeet
Should that last line read `return {count: total};`?
gnarf
@gnarf - thanks. modified the answer.
rubayeet
http://pastebin.me/51e1e0f24cb174991ebd9072f1d9bbec -- Tested with some rough testdata, seems to be doing what the poster intended... +1
gnarf
This works a treat, thanks!
Tom
+1  A: 

Hi Tom! You can also try group function.


db.stat.group({key:{"create_at_minute":true}
              ,initial:{count:0}
              ,reduce:function(doc,out){out.count++}})

where create_at_minute is your create_at field rounded by minutes.

walla
Where does create_at_minute come from, does mongodb figure it out when I run that query?
Tom
.group() is an easier way to achieve aggregation, however it has limitations. The BSON object returned must be small, with less than 10K keys, otherwise you get an exception.
rubayeet
10K minutues is about 6 days ...
walla
Tom, i'm a little fib. The "key" in .group() must be in collection. so the solution by rubayeet is more correct and general.
walla
ah, if the limit is 10k then I'm out of luck my dataset is already at 38k. Thanks, group looks interesting for other datasets.
Tom