views:

58

answers:

3

Now there is a subject that could be taken many ways. Hopefully I will be able to de-obfuscate it as I describe my problem and start getting suggestions.

I am developing a site that will be replacing an existing one. Historically one of the problems we have had is spider bots coming in and sucking down all out content. Now we don't mind that the content is being downloaded. In fact we are glad for it, however some of the bulk downloaders and download accelerators have proved problematic with the current site.

What I am looking for is something to sit at the beginning of my php that runs pretty much first. It takes a fingerprint of the page request (ip, referrer, request uri, cookies, session id, whatever) and passes it to ...something. That something then compares the fingerprint to fingerprints in the last second or three. It then returns a message based on some pre-configured threshold what to do with the request.

Some thresholds are:

  • The user has requested > x pages in the last 0.n seconds.
  • The user has requested the same page in < 0.n seconds.
  • The user has submitted the identical data to a form in the last n seconds.

So you see I am looking at some pretty tight windows. Is detecting such things even feasible? Would I be able to do it with some sort of file or db data source? Whatever I use to store the fingerprints between page loads is going to experience a lot of churn since most data will be held for a second or two. Should I just have something that parses the apache logs to check against the threshold? Should I be looking for some sort of external daemon that holds the data for a second or two in memory that I can call from the script? Is there something in apache that can handle this, and do I just need to punt to the server guy to handle this?

Assuming that this is something I can do in PHP or some called external daemon how do I respond to behavior out of the thresholds? My gut says HTTP responses, something like 408 or 503, but my gut is often wrong. What can I do to tell the client to back off a bit? Some sort of "Woah there" page?

+3  A: 

If you don't have to have a software solution, why not program your router/firewall to handle this for you? Filtering out DOS attacks (or their equivalent) is part of what it's there for.

Russell Steen
+2  A: 

Try mod_evasive

Azeem.Butt
+2  A: 

Try PEAR::HTTP_Floodcontrol and mod_security and fail2ban.

powtac
Also, http://www.bad-behavior.ioerror.us/
Frank Farmer