views:

526

answers:

8

I have a nice and lovely Django site up and running, but have noticed that my error.log file was getting huge, over 150 MB after a couple of months of being live. Turns out a bunch of spambots are looking for well known URL vulnerabilities (or something) and hitting a bunch of sub-directories like http://mysite.com/ie or http://mysite.com/~admin.php etc.

Since Django uses URL rewriting, it is looking for templates to fit these requests, which raises a TemplateDoesNotExist exception, and then a 500 message (Django does this, not me). I have debug turned off, so they only get the generic 500 message, but it's filling up my logs very quickly.

Is there a way to turn this behavior off? Or perhaps just block the IP's doing this?

A: 

A programming solution would be to :

  • open the log file
  • read the lines in a buffer
  • replace the lines that match the errors the bots caused
  • seek to the beginning of the file
  • write the new buffer
  • truncate the file to current pointer position
  • close

Voila ! It's done !

Geo
+6  A: 

Um, perhaps, use logrotate to rotate and compress the logs periodically, if it isn't being done already.

ayaz
It is not being run. That might be the best way, though I'd rather do something about the 500 errors.
swilliams
I would definitely get logrotate up and running. That aside, the part about 500 confuses me. If there is no view attached to a URL, it should always give a 404, debug or not, no?
ayaz
@ayaz - Yeah, a 404 makes sense to me, I'm guessing it's a Django thing.
swilliams
+3  A: 

"Is there a way to turn this behavior off?" - the 500 is absolutely mandatory. The log entry is also mandatory.

"Or perhaps just block the IP's doing this?" - don't we wish.

Everyone has this problem. Just about everyone uses Apache log rotation. Everyone else either uses an OS rotation or rolls their own.

S.Lott
+3  A: 

If you can find a pattern in UserAgent string, you may use DISALLOWED_USER_AGENT setting. Mine is:

DISALLOWED_USER_AGENTS = (
    re.compile(r'Java'),
    re.compile(r'gigamega'),
    re.compile(r'litefinder'),
)

See the description in Django docs.

zgoda
But this won't avoid the log lines, just change what status code is returned.
Ned Batchelder
A: 

How about setting up a catch-all pattern as the last item in your urls file and directing it to a generic "no such page" or even your homepage? In other words, turn 500's into requests for your homepage.

Parand
A: 

Why not fix those "bugs"? If a url pattern is not matched, then a proper error message should be shown. By adding those templates you will help the user and yourself :-)

Armandas
+2  A: 

Django should be throwing a 404, not a 500, if the URL doesn't match any entries in your URLConf.

http://docs.djangoproject.com/en/dev/topics/http/urls/#handler404

You need to provide a 404 template:

If you don't define your own 404 view -- and simply use the default, which is recommended -- you still have one obligation: To create a 404.html template in the root of your template directory. The default 404 view will use that template for all 404 errors.

Michael Warkentin
A: 
  1. Yes, it should be a 404, not a 500. 500 indicates something is trying to deal with the URL and is failing in the process. You need to find and fix that.

  2. We have a similar problem. Since we are running Apache/mod_python, I chose to deal with it in .htaccess with mod_rewrite rules. I periodically look at the logs and add a few patterns to my "go to hell" list. These all rewrite to deliver a 1x1 pixel gif file. There is no tsunami of 404s to clutter up my log analysis and it puts minimal load on Django and Apache.

You can't make these a**holes go away, so all you can do is minimize their impact on your system and get on with your life.

Peter Rowell