views:

198

answers:

2

The output from MongoDB's map/reduce includes something like 'counts': {'input': I, 'emit': E, 'output': O}. I thought I clearly understand what those mean, until I hit a weird case which I can't explain.

According to my understanding, counts.input is the number of rows that match the condition (as specified in query). If so, how is it possible that the following two queries have different results?

db.mycollection.find({MY_CONDITION}).count()

db.mycollection.mapReduce(SOME_MAP, SOME_REDUCE, {'query': {MY_CONDITION}}).counts.input

I thought the two should always give the same result, independent of the map and reduce functions, as long as the same condition is used.

A: 

The map/reduce pattern is like a group function in SQL. So there are grouping some result in one row. So your can't have same number of result.

The count in mapReduce() method is the number of result after the map/reduce function.

By example. You have 2 rows :

{'id':3,'num':5}
{'id':4,'num':5}

And you apply the map function

function(){
  emit(this.num, 1);
}

After this map function you get 2 rows:

{5, 1}
{5, 1}

And now you apply your reduce method :

function(k,vals) {
     var sum=0;
     for(var i in vals) sum += vals[i];
     return sum;
}

You have now only 1 row return :

2
shingara
I know what map/reduce is. As I said, 'counts' is not a number, but a dictionary, which, among others, contains a member called 'input'. According to the MongoDB docs, this is the "number of objects scanned". Now my question is - is this equal to the number of objects that match the condition, or is there something else I need to take into account? Please reread my question and let me know how to improve it if it's not clear enough. :)
ionut bizau
A: 

Is your server steady-state in between the two calls?

mdirolf
Yes. :) And the two numbers I get are *very* different, looks like it wouldn't be the same query...
ionut bizau
hmm sounds like it might be an issue then - can you send an email w/ your test case to mongodb-user on google groups?
mdirolf
I was under a tight deadline - and I was just doing some reports, so code quality was not a concern, so I just simulated the map/reduce in Python (turned out to be fast and accurate). Later I discovered the database was actually corrupt (perhaps due to killing the server multiple times), so that might explain the weird behavior. Don't think I'd be able to reproduce it anymore. Thanks anyway. :)
ionut bizau