views:

34

answers:

1

Hi,

So using the regular MongoDB library in Ruby I have the following query to find average filesize across a set of 5001 documents:

avg = 0
    total = collection.count()
    Rails.logger.info "#{total} asset creation stats in the system"
    collection.find().each {|row| avg += (row["filesize"] * (1/total.to_f)) if row["filesize"]}

Its pretty simple, so I'm trying to do the same using map/reduce as a learning exercise. This is what I came up with:

map = 'function(){emit("filesizes", {size: this.filesize, num: 1});}'
    reduce = 'function(k, vals){
            var result = {size: 0, num: 0};
            for(var x in vals) {
              var new_total = result.num + vals[x].num;
              result.num = new_total
              result.size = result.size + (vals[x].size * (vals[x].num / new_total));
            }
            return result;
    }'
    @results = collection.map_reduce(map, reduce)

However the two queries come back with two different results!

What am I doing wrong?

+1  A: 

You're weighting the results by doing the division in every reduce function.

Say you had [{size : 5, num : 1}, {size : 5, num : 1}, {size : 5, num : 1}]. Your reduce would calculate:

result.size = 0 + (5*(1/1)) = 5
result.size = 5 + (5*(1/2)) = 7.25
result.size = 7.25 + (5*(1/3)) = 8.9

As you can see, this weights the results towards the earliest elements.

Fortunately, there's a simple solution. Just add a finalize function, which will be run once after the reduce step is finished.

kristina