tags:

views:

48

answers:

2

I have a bunch of posts which have category tags in them. I am trying to find out how many times each category has been used.

I'm using rails with mongodb, BUT I don't think I need to be getting the occurrence of categories from the db, so the mongo part shouldn't matter.

This is what I have so far

@recent_posts = current_user.recent_posts #returns the 10 most recent posts
@categories_hash = {'tech' => 0, 'world' => 0, 'entertainment' => 0, 'sports' => 0}
    @recent_posts do |cat|
       cat.categories.each do |addCat|
         @categories_hash.increment(addCat) #obviously this is where I'm having problems
       end
     end
end

the structure of the post is

{"_id" : ObjectId("idnumber"), "created_at" : "Tue Aug 03...", "categories" :["world", "sports"], "message" : "the text of the post", "poster_id" : ObjectId("idOfUserPoster"), "voters" : []}

I'm open to suggestions on how else to get the count of categories, but I will want to get the count of voters eventually, so it seems to me the best way is to increment the categories_hash, and then add the voters.length, but one thing at a time, i'm just trying to figure out how to increment values in the hash.

+1  A: 

If you're using mongodb, an elegant way to aggregate tag usage would be, to use a map/reduce operation. Mongodb supports map/reduce operations using JavaScript code. Map/reduce runs on the db server(s), i.e. your application does not have to retrieve and analyze every document (which wouldn't scale well for large collections).

As an example, here are the map and reduce functions I use in my blog on the articles collection to aggregate the usage of tags (which is used to build the tag cloud in the sidebar). Documents in the articles collection have a key named 'tags' which holds an array of strings (the tags)

The map function simply emits 1 on every used tag to count it:

function () {
  if (this.tags) {
    this.tags.forEach(function (tag) {
      emit(tag, 1);
    });
  }
}

The reduce function sums up the counts:

function (key, values) {
  var total = 0;
  values.forEach(function (v) {
    total += v;
  });
  return total;
}

As a result, the database returns a hash that has a key for every tag and its usage count as a value. E.g.:

{ 'rails' => 5, 'ruby' => 12, 'linux' => 3 }
Andreas
Well, I've been tempted to learn and use map/reduce anyway, I thought their might have been an easier way to do this with ruby, but I'll give your way a shot and report back.
pedalpete
It sure is easier to do in Ruby, but also less efficient, though it still might be sufficient for small sites. I posted another answer with a variation of your original code.
Andreas
+1  A: 

If you aren't familiar with map/reduce and you don't care about scaling up, this is not as elegant as map/reduce, but should be sufficient for small sites:

@categories_hash = Hash.new(0)
current_user.recent_posts.each do |post|
  post.categories.each do |category|
    @categories_hash[category] += 1
  end
end
Andreas
You can also remove the `@categories_hash[category] ||= 0` if you change the first line to `@categories_hash = Hash.new(0)` or initialize it like it was in the original question.
Ben Alpert
You're right, thanks. I always forget about hash default values. Editing...
Andreas