tags:

views:

69

answers:

2

Hi,

I am trying to block request been made to our pagination parameter by multiple robots (evil ones it seems)

Hundreds of these types of requests are showing up:

http://www.ourdomain.com/search.php?q=search+query&page=366100876

Is there a way using regular expressions in .htaccess to send any request that requests a page larger than 1000 or anything more than 4 digits in the parameter 'page' ?

'q' parameter is of course always different.

Thank you.

A: 

I derived most of this from a really cool article called Ultimate .htaccess file sample. Very handy.

 Redirect 500 /error500.html

 RedirectMatch 500 ^.+{1001}.+$

That would send away any long URLs.

 LimitRequestBody 102400

That would limit any requests over 100K.

To target the GET variable page specifically:

RedirectMatch 500 ^.+page=[0-9]{4}.+$
Anthony
Simon
Is `=` part of regex syntax? I'm not sure, honestly. 1) You don't need Rewrite Engine on. This rule allows you to use regex without it. 2) Try escaping out the `=`, so : `RedirectMatch 500 ^.+page\=[0-9]{4}.?$`. The `?` at the end will make it so that the URL can end right after a number between 1000 and 9999
Anthony
A: 

I tried this and it works, added it to some other checks I had:

RewriteCond %{QUERY_STRING} page=[0-9]{4} [OR]

RewriteCond %{QUERY_STRING} mosConfig_[a-zA-Z_]{1,21}(=|\%3D) [OR]
# Block out any script trying to base64_encode crap to send via URL
RewriteCond %{QUERY_STRING} base64_encode.*\(.*\) [OR]
# Block out any script that includes a <script> tag in URL
RewriteCond %{QUERY_STRING} (\<|%3C).*script.*(\>|%3E) [NC,OR]
# Block out any script trying to set a PHP GLOBALS variable via URL
RewriteCond %{QUERY_STRING} GLOBALS(=|\[|\%[0-9A-Z]{0,2}) [OR]
# Block out any script trying to modify a _REQUEST variable via URL
RewriteCond %{QUERY_STRING} _REQUEST(=|\[|\%[0-9A-Z]{0,2})
# Send all blocked request to homepage with 403 Forbidden error!
Simon
Simon, welcome to SO. You can mask code using the "code" button in the editor, otherwise the hashes get interpreted as `<h1>` titles :)
Pekka