views:

102

answers:

3

We have a pretty large website, which handles a couple million visitors a month, and we're currently developing an API for some of our partners. In the near future, we hope to expose the API to all of our visitors.

As we are trying to limit the number of requests to say, 100.000 / day and 1.000 / minute, I need to log the total number of requests per API key, and verify that the API user doesn't exceed this total. We won't check real-time whether the limit is exceeded at this point, but afterwords in the site's control panel. We also need to display a timeline per user in the control panel, so we have to get a quick per day or per hour overview if we need this.

My first idea was to build the following app:

API User => Webserver => Posts message with API key to a message queue => Service picks up the message => Posts to the database, where there is 1 item for each user-hour combo (key|hour|count). Which is gonna be quite fast, yet we'll remove quite some useful information (queries, requests / minute, etc.) Saving each and every request as separate record in the database will likely generate millions of records a day, and will (I guess, I'm not that much of a DBA) be quite slow when generating some chart. Even with the correct indices.

Our platform consists of around ten webservers, ten front end SQL servers, statsserver, some other servers for processing large tasks. All Windows (except our EMC), and MS SQL. Dev platform is ASP.Net WCF.

+1  A: 

My advice would be to log everything - a simple append-only text file is simplest - and have a background task periodically read and summarize log segments into the database. This has several advantages over more 'sophisticated' approaches:

  • It's simpler.
  • It's really easy to debug.
  • You can keep individual log segments around until you need to delete them from disk, so you can get information on individual requests for debugging and accounting purposes.
  • You can easily extend it to collect more information, or improve and change your summarizer, because the components are loosely coupled.
  • It's easy to shard - just have each server keep its own logs.
Nick Johnson
Isn't this likely to be a performance problem? Why not choose a message queue?
Jan Jongboom
How can logging to a plain text file be higher overhead than using a message queue? If you're thinking of something in-memory only, you can always log to /dev/shm.
Nick Johnson
A: 

I'd start with logging at first, and leave off enforcement. Your logging may show you that you don't need enforcement, or it may show you you need a different kind of enforcement.

I'd just start off creating a simple logging API: ApiLogger.Log(apiKey). I'd have the logger take authentication information etc. from HttpContext. I'd start at first just dumping it into a database table, and only get fancier if performance required it.

Later analysis could determine who is making how many calls, whether you want multiple tiers, charging different amounts per tier, etc. But for the moment, just store the data that your Business people will need.

John Saunders
A: 

As we are trying to limit the number of requests to say, 100.000 / day and 1.000 / minute, I need to log the total number of requests per api key, and verify that the api user doesn't exceed this total.

A feature like this will be part of WCF (if not already) in the very near future. I am currently racking my brain on where I heard it so I can point you in the right direct.

EDIT: FOUND IT! This week on a podcast called "The Thirsty Developer", this very topic came up. Download the podcast here, and at 39:40 into the podcast the topic comes up. For those that do not want to listen there is a REST toolkit that has this feature in it. I think the toolkit can be found here

Tony
Very useful; will listen to this in the weekend, we also have around 4 mil u/visitors a month, so might also be interesting for other things than just this question. Great advantage is we already use the REST starter kit.
Jan Jongboom