views:

294

answers:

1

All of the MongoDB MapReduce examples I have seen have dealt with counting/adding numbers. I need to combine strings, and it looks like MapReduce is the best tool for the job. I have a large MongoDB collection in this format:

{name: userone, type: typeone}
{name: usertwo, type: typetwo}
{name: userthree, type: typeone}

Each name only has one type, but names are not necessarily unique. I want to end up with a collection that lists all names for a particular type, either in a comma separated list or an array, like this:

 {type: typeone, names: userone, usertwo}
 {type: typetwo, names: userthree}

I was trying to use MapReduce to accomplish this. My function works correctly when there is only one user for a type. However, when there is more than one user, 'undefined' is stored in the names field.

I'm not very good at Javascript, and I'm still learning MongoDB so it's probably a simple data type or scope error.

Here are my map and reduce functions. What's wrong with them?

map = function() {
emit(this.user,{type:this.type});
}

reduce = function(key, values) {
var all="";
for(var i in values) {
all+=values[i]['type']+",";
}
return all;
}
+1  A: 

It looks to me like you're trying to do a group-by via type. If so, you should be emitting type first. From there, its pretty much the same as your code, but I took the liberty of cleaning it up a bit.

Beware, the reduce function could get called multiple times on smaller groups. Therefore, if you used your code in a sharded environment, you may get extra trailing commas. See Reduce Function for more information.

Map:

m = function(){ emit(this.type, {names:this.name}); }

Reduce:

r = function(key, values){
  var all = [];
  values.forEach(function(x){
    all.push(x.names)
  })
  return {"names": all.join(", ")};
}

Usage:

res = db.users.mapReduce(m,r); db[res.result].find()

Alternate:

Per OP request, here is a version that returns an array for names instead of a comma separated list string:

m = function () {
    emit(this.type, {names:this.name});
}

r = function (key, values) {
    var all = [];
    values.forEach(function (x) {all.push(x.names);});
    return {type:key, names:all};
}

f = function (w, r) {
    r.names = r.names[0];
    return r
}

res = db.users.mapReduce(m,r, {finalize:f}); db[res.result].find()

Cheers!

Van Nguyen
That works great...now how would I modify that code to have names in an array instead of a comma separated list(then I can build the comma separated list on the client if needed)?Simply putting in return {"names": all}; in the map function works but gets me a bunch of ugly nested arrays like [0] => Array ( [0] => Array ( [0] => Array ( [0] => Array ( [0] => Array ( [0] => Array (
Jonathan Knight
Ahh... yeah. I wasn't sure what your intent was, so I guessed based on your code. I'll modify my answer to include an array example.
Van Nguyen