views:

201

answers:

3

I have a photo-hosting website, and I want to keep track of views to the photos. Due to the large volume of traffic I get, incrementing a column in MySQL on every hit incurs too much overhead.

I currently have a system implemented using Memcache, but it's pretty much just a hack.

Every time a photo is viewed, I increment its photo-hits_uuid key in Memcache. In addition, I add a row containing the uuid to an invalidation array also stored in Memcache. Every so often I fetch the invalidation array, and then cycle through the rows in it, pushing the photo hits to MySQL and decrementing their Memcache keys.

This approach works and is significantly faster than directly using MySQL, but is there a better way?

A: 

There is a way that I use.

Method 1: (Size of a file) Every time that someone hits the page, I add one more byte to a file. Then after x seconds or so (I set 600), I will count how many bytes that are in my file, delete my file, then I update it to the MySQL database. This will also allow scalability if multiple servers are adding to a small file in a cache server. Use fwrite to append to the file and you will never have to read that cache file.

Method 2: (Number stored in a file) Another method is to store a number in a text file that contains the number of hits, but I recommend from using this because if two processes were simultaneously updating, data might be off (maybe same with method1).

I would use method 1 because although it is a bigger file size, it is faster.

joshli
These are interesting ideas, but I don't think they'll improve the speed of my application. They'll both end up being slower due to file seek time and disk contention.
Justin
Hmm, I never took that factor into account. But, in the future, there will be SSD's (but what is important is right now). I don't think that there would be any other way.Unless:Instead of writing to disk, you could write to ram on another server but thats basically what Memcache is doing.
joshli
@Justin: How many pictures you have and how many visitors you have to serve? I don't think disk seek time and disk contention would be an issue if you are running on an OS that writes back file data say every 60 seconds.
Bandi-T
How would this approach scale out to multiple http servers?
Jay Paroline
A: 

I'm assuming you're keeping access logs on your server for this solution.

  1. Keep track of the last time you checked your logs.
  2. Every n seconds or so (where n is less than the time it takes for your logs to be rotated, if they are), scan through the latest log file, ignoring every hit until you find a timestamp after your last check time.
  3. Count how many times each image was accessed.
  4. Add each count to the count stored in the database.
  5. Store the timestamp of the last log entry you processed for next time.
pib
+1  A: 

I did some research and it looks like Redis might be my solution. It seems like it's essentially Memcache with more functionality - the most valuable to me is listing, which pretty much solves my problem.

Justin