views:

26

answers:

1

Hi all,

I'm using a Mongo MapReduce to perform a word-count operation on a bunch of documents. The documents are very simple (just an ID and a hash of words):

{ "_id" : 6714078, "words" : { "my" : 1, "cat" : 1, "john" : 1, "likes" : 1, "cakes" : 1 } }
{ "_id" : 6715298, "words" : { "jeremy" : 1, "kicked" : 1, "the" : 1, "ball" : 1 } }
{ "_id" : 6717695, "words" : { "dogs" : 1, "can't" : 1, "look" : 1, "up" : 1 } }

The database is called "words" in my environment, the collections in question are named "wordsX" where X is a category number (I know, don't ask). The field in the document hash where the words are stored is also named "words". Gah.

The problem I'm having is that under certain conditions in my PHP app, the MapReduce doesn't return any data. Annoyingly, running the same commands from the Mongo shell gives perfect results. I'm trying to pin down where this bug is occurring but I'm really stumped, so hoping someone might be able to shed some light on this. The lead-up to this question does go on a bit, because the environment is a bit complicated, but please bear with me.

The commands I've tried running from the Mongo shell to replicate the PHP-based operations are as follows:

m = function () {
    if (this.words) {
        for (index in this.words) {
            emit(index, this.words[index]);
        }
    }
}
r = function (key, values) {
    var total = 0;
    for (var i in values) {
        total += values[i];
    }
    return total;
}
res = db.words.mapReduce(m, r, { query : { _id : { $in : [6714078,6715298,6717695] } } });

This results in a temporary collection being created containing the word count data. All OK so far.

However if I run the same commands from PHP (using the standard Mongo library), I end up with no data under certain conditions. It's a bit tricky to describe because I don't want to bore you with the details of the application/environment beyond Mongo, but basically I'm using Sphinx to filter some records, then supplying a list of content IDs to Mongo on which the MapReduce is performed. If I filter back into the data set by 2 or 3 days, I get results back from Mongo; if I don't filter, I get an empty dataset back. The PHP code to run the same operation is as follows. I've not included the Sphinx-based parts as I don't think they're relevant (just know that we get a list of IDs back) because I've tried supplying exactly the same list to Mongo on the command line and got the right results, whereas I don't from within PHP. Hope that makes sense.

The PHP code I'm using looks like this:

$objMongo = new Mongo();
$objDB = $objMongo->words;

$arrWordList = array();

$strMap = '
    function() {
        if (this.words) {
            for (index in this.words) {
                emit(index, this.words[index]);
            }
        }
    }
';

$strReduce = '
    function(key, values) {
        var total = 0;
        for (var i in values) {
            total += values[i];
        }
        return total;
    }
';

$objMapFunc = new MongoCode($strMap);
$objReduceFunc = new MongoCode($strReduce);
$arrQuery = array(
    '_id' => array('$in' => $arrIDs) // <--- list of IDs from Sphinx
);
$arrCommand = array(
    'mapreduce' => 'wordsX',
    'map' => $objMapFunc,
    'reduce' => $objReduceFunc,
    'query' => $arrQuery
);

MongoCursor::$timeout = -1; 

$arrStatsInfo = $objDB->command($arrCommand);

var_dump($arrStatsInfo);

The contents of the result-info array ($arrStatsInfo) under working and non-working conditions (the filtering as specified above) are as follows.

Working results:

array(4) {
  ["result"]=>
  string(31) "tmp.mr.mapreduce_1279637336_227"
  ["timeMillis"]=>
  int(171)
  ["counts"]=>
  array(3) {
    ["input"]=>
    int(54)
    ["emit"]=>
    int(2517)
    ["output"]=>
    int(1526)
  }
  ["ok"]=>
  float(1)
}

Empty results:

array(4) {
  ["result"]=>
  string(31) "tmp.mr.mapreduce_1279637381_228"
  ["timeMillis"]=>
  int(21)
  ["counts"]=>
  array(3) {
    ["input"]=>
    int(0)
    ["emit"]=>
    int(0)
    ["output"]=>
    int(0)
  }
  ["ok"]=>
  float(1)
}

So it looks like under the broken condition, no records even make it into the MapReduce. I've spent ages trying to work out what on earth is going on here but I've had no insights thus far. As I've said, running the same commands (as above) directly in the Mongo command line using exactly the same set of IDs returns the right results.

After all that, I guess my question is: is there anything obviously wrong with the PHP-Mongo interaction I'm doing above? Are there other steps I can take to try to debug this?

Please let me know if supplying any further information would be helpful. I appreciate this is a somewhat expansive and ill-defined question but I've tried my best to communicate the issue! Really hope someone can suggest a way out of this.

Many thanks for reading!

A: 

This is not a direct answer to your question, but I would suggest you try asking on the mongodb-user list -- it might be a bug in the bindings of the PHP library, and the MongoDB staff generally responds quickly to questions & bug reports. That list is more likely to give you assistance than posting on this site, IMHO.

Jason S
Thanks Jason, I'll give that a go.
BigglesZX