views:

130

answers:

1

Superfeedr is a feed-parsing on demand service. We want to provide analytics to our users and we're investigating waht would be the best strategy to do so.

In a nutshell, we want to track the number of operations (events, like : new entry in a given feed) in our system as well as agregated data (number of subscriber for feed).

Of course, the agregated data can be "computed" based on the the events. (the number of susbcribers to a feed is the sum of subscriptions, minus the sum of unsubscriptions). Yet, since we want to study that over time (number of susbcribers on a daily basis), the evented approach may be sub-optimal, since we would re-compute the same thing over and over.

How would you build such a component in your app? What information flow? What data-stores? What graphing solution? etc...

I know this is quite an open question, but I am sure we're not the first ones with such a need!

[UPDATE]: Infrastructure : We have a set of workers, that are XMPP clients and interact all together. They are based out of EventMachine, which means that they do not block on IO. Desired target : we must be able to collect massive amounts of data. Currently, we are already at about 200-300 msg/sec and we aim at 10x-100x that.

+2  A: 

It's tough to say without more information about your infrastructure and desired scaling targets. You may find this slide deck about How Twitter Uses Hadoop to be instructional. It was presented by Kevin Weil at the recent NoSQL East conference.

alt text

Borrowing ideas from what Twitter is doing you could consider an architecture split into collection, analysis and render phases.

Collection Phase: Super low latency. Very scalable. Lots of binding choices. Developed at facebook.

Processing Node Log Event -> Scribe -> HDFS

Analysis Phase: SQL-like query language that will allow you to do exploratory ad-hoc queries as well.

HDFS -> Pig -> MySQL

Render Phase: Implemented in your current web framework

MySQL -> JSON -> Memcached -> Flash Charting

There have been some posts here on SO regarding choice of Flash charting components for thew web. I personally have had good success with AmCharts.

Ryan Cox
Interesting thoughts. Thanks. I completed the question with your questions :)
Julien Genestoux