tags:

views:

76

answers:

3

Hello,

I have to implement a tracking system backed up by a MySQL database. The system will track many apps with at least 5 events tracked for each app (e.g. how many users clicked on link x, how many users visited page y). Some apps will have millions of users so a few thousand updates/second is not a far fetched assumption. Another component of the system will have to compute some statistical info that should be update every minute. The system should also record past values of those statistical values.

The approach a friend of mine suggested was to log every event in a log table and have a cron job that runs every minute and computes the desired info and updates a stats table.

This sounds reasonable to me. Are there better alternatives?

Thanks.

+1  A: 

I've logged to a mysql log table with a cron that crunches it.

I generally use innodb tables in my apps, but for the log table I did it as myisam and used insert DELAYED . . . queries.

Myisam doesn't provide all the goodies of innodb, but I believe it is slightly faster (for that reason).

The main thing you are worried about is database locking when your cron is running, but using "insert delayed" gets around that problem for the most part.

Gattster
From your experience which of these would be better/faster: using memcached or another form of memory cache or having mysql do something similar via INSERT DELAYED (afaik mysql queues unwritten rows in memory when using delayed inserts)?
Alex
Both sound like valid options. For ultimate performance the answer about using an already existing log analyzer has a good point. I think you should consider convenience and not over-optimize. I've found the insert delayed into mysql works fine for my sites (logging up to a million inserts per day), and it is extremely easy to implement.
Gattster
Thanks for the input.
Alex
A: 

I would really recommend you to use an already existing log analyzer analyzing the already existing logs from your web server. One example is webalizer. Even better in my opinion is an external system such as google analytics. This works better since it will keep working with intermediate systems such as load balancers and caches in place.

disown
The problem is that the client apps run on different servers so I don't have access to the log files.
Alex
Google analytics works by adding some code to every page which you want to track. Very simple to just put in a header.
disown
+1  A: 

if your hits rate it too high for even insert delated into myisam table to handle, you may want to keep recent hits in memory (memcache can come in handy, or a custom daemon you can write) and process the hits from memory periodically into the database stats table (aggregated).

Omry
That's one of the approaches I had in mind as well :).
Alex
I have thought of the custom daemon approach in the past, but I never wrote one because I thought `insert delayed` was basically the same thing. You could easily set up another copy of mysql on a different server if you don't want to overload your main DB server.
Gattster
it's not quite the same because insert delayed is going to hit the disk eventually. if you just want to aggregate some data and flush it every once in a while you don't need to ever have every hit on the disk.
Omry