views:

76

answers:

3

I'm looking into way to track events in a django application (events would generally be clicks tied to a specific unique user id).

These events would essentially contain an event type like "click" and then each click event would be assigned to a unique id (many events can go to one id) and each event would have a data set including items like referrer etc...

I have tried mixpanel, but for now the data api they are offering seems too limiting as I can't seem to find a way to get all of my data out by a unique id (apart from the event itself).

I'm looking into using django-eventracker, but curious about any others thought on the best way to do this. Mongo or CouchDb seem like a great choice here, but the celery/rabbitmq looks really attractive with mongo. Pumping these events into the existing applications db seems limiting at this point.

Anyways, this is just a thread to see what others thoughts are on this and how they have implemented something like this...

shoot

+3  A: 

I am not familiar with the pre-packaged solutions you mention. Were I to design this from scratch, I'd have a simple JS collecting info on clicks and posting it back to the server via Ajax (using whatever JS framework you're already using), and on the server side I'd simply append that info to a log file for later "offline" processing -- so that would be independent of django or other server-side framework, essentially.

Appending to a log file is a very light-weight action, while DBs for web use are generally way optimized for read-intensive (not write-intensive) operation, so I agree with you that force fitting that info (as it trickes in) into the existing app's DB is unlikely to offer good performance.

Alex Martelli
I will need the ability to do more analysis on the data than a log file will offer but the log file is not a bad idea. The events are processed through the server via ajax calls, but I like the idea of a task queue at this point as well....
jmat
@jmat - there aren't really limitations on what you can and can not put into log files... as @Alex mentioned, you can always parse that data "offline" into whatever type of structures you need to do your real analysis.
Matthew J Morrison
@jmat, as @Matthew says, logging offers exactly the same possibilities for "analysis on the data" as you'd get by pumping the data directly into any program -- the log just _stays_ around a while, so it can be processed (more than once, if need be) when most convenient to do so (e.g, one lightweight, fast processing done at once by a watching daemon for some simple stuff you need to know at once, more thorough warehousing offline later -- whatever!).
Alex Martelli
+1  A: 

If by click, you mean a click on a link that loads a new page (or performs an AJAX request), then what you aim to do is fairly straightforward. Web servers tend to keep plain-text logs about requests - with information about the user, time/date, referrer, the page requested, etc. You could examine these logs and mine the statistics you need.

On the other hand, if you have a web application where clicks don't necessarily generate server requests, then collecting click information with javascript is your best bet.

advait
These clicks can come from multiple sources internal and external domains, so generally speaking js is the only answer here...that is already working though, I'm more interested in ways to store large amounts of this data without impacting the click throughs and page loads.
jmat
+1  A: 

You probably want to keep a flexible format for your logs to anticipate future needs or changes. In this sense, the schema-less document-oriented databases are nice. One advantage is that the structure of your data will be close to your application needs for whatever analyses you perform later (so, avoiding some of the inevitable parsing/data munging work).

If you're thinking about using mysql, postgresql or such, then you should look into something like rsyslog for buffering writes and avoiding the performance penalty with heavy logging. (I can't say much about celery and other queueing mechanisms for this type of thing, but they sound promising.)

Mongodb has a some nice features that make it amenable to logging such as capped collections. A summary can be found in this post.

ars
The last link you supplied is one of the prime reasons I'm looking at using mongo for this purpose..thx.
jmat