views:

828

answers:

1

Hello guys,

First time Map/Reduce user here, and using MongoDB. I have a lot of page visit data which I'd like to make some sense of by using Map/Reduce. Below is basically what I want to do, but as a total beginner a Map/Reduce, I think this is above my knowledge!

  1. Go through all the pages with visits in the last 30 days, and where external = true.
  2. Then for each page, find all visits
  3. Group all visits by referral location
  4. For each referral location, calculate how many then went to visit a page which has a certain "type" and also has a certain word in the "tags".

The database and collection are organised as

$mongo->dbname->visits

A sample document is:

{"url": "www.example.com", "type": "a", "refer": {"external": true, "domain": "twitter.com", "url": "http://www.twitter.com/page"}, "page": "1235", "user": "1232", "time": 1234567890}

And then I want to find documents of type B with a certain tag.

{"url": "www.example.com", "type": "b", "page": "745", "user": "1232", "time": 1234567890, "tags": {"a", "b", "c"}}

I'm using the normal Mongo PHP extension if that has an impact.

+3  A: 

Ok, I've come up with something that I think may do what you want. Note, that this may not work exactly since I'm not 100% sure of your schema (considering your examples show refer available in type a, but not b (I'm not sure if that's an omission, or what considering you want to view by referer)... Anyway, here's what I've come up with:

The map function:

function() {
    var obj = {
        "types": {},
        "tags": {},
    }
    obj.types[this.type] = 1;
    if (this.tags) {
        for (var tag in this.tags) {
            obj.tags[this.tags[tag]] = 1;
        }
    }
    emit(this.refer.url, obj);
}

The Reduce function:

function(key, values) {
    var obj = {
        "types": {},
        "tags": {},
    }
    for (var i = 0; i < values.length; i++) {
        for (var type in values[i].types) {
            if (!type in obj.types) {
                obj.types[type] = 0;
            }
            obj.types[type] += values[i].types[type];
        }
        for (var tag in values[i].tags) {
            if (!tag in obj.tags) {
                obj.tags[tag] = 0;
            }
            obj.tags[tag] += values[i].tags[tag];
        }
    }
    return obj;
}

So basically, how it works is this. The Map function uses a key of refer.url (what I guessed based on your description). So the end result will look like an array with _id equal to refer.url (It groups based on url). It then creates an object that has two objects under it (types and tags). The reason for the object is so that map and reduce can emit the same format object. Other than that, I THINK that it should be relatively self explanatory (If you don't understand, I can try to explain more)...

So let's implement this in PHP (Assuming that $map and $reduce are strings with the above contained with them for terseness):

$mapFunc = new MongoCode($map);
$reduceFunc = new MongoCode($reduce);
$query = array(
    'time' => array('$gte' => time() - (60*60*60*24*30)),
    'refer.external' => true
);
$collection = 'visits';
$command = array(
    'mapreduce' => $collection,
    'map' => $mapFunc,
    'reduce' => $reduceFunc,
    'query' => $query,
);

$statsInfo = $db->command($command);

$statsCollection = $db->selectCollection($sales['result']);

$stats = $statsCollection->find();

foreach ($stats as $stat) {
    echo $stats['_id'] .' Visited ';
    foreach ($stats['value']['types'] as $type => $times) {
        echo "Type $type $times Times, ";
    }
    foreach ($stats['value']['tags'] as $tag => $times) {
        echo "Tag $tag $times Times, ";
    }
    echo "\n";
}

Note, I haven't tested this. This is just what I've come up with based on my understanding of your schema, and from my understanding of Mongo and its Map-Reduce implementation...

ircmaxell