tags:

views:

203

answers:

3

I read that spammers may be downloading a specific registration page on my site using curl. Is there any way to block that specific page from being CURLed, either through htaccess or other means?

A: 

I don't think this is possible to block curl, as curl has the ability to send user agents, cookies, etc. As far as I understand, it can completely emulate a normal user.

If you are worried about protecting a form, you can generate a random token which is submitted automatically when the form is submitted. That way, anyone who tries to make a script to automate registration will have to worry about scraping it first.

Oren
there is already a random token generated. thanks, though.
55skidoo
A: 

As Oren says, spammers can forge user-agents, so you can't just block the curl user-agent string. The typical solution here is some kind of CATPCHA. These are often jumbled images (though non-visual forms exist) sites (including StackOverflow) have you transcribe to prove you're human.

Matthew Flaschen
Aware of CAPTCHA but trying to avoid it on principle - that spammers shouldn't make things harder for users. Idealistic, I know! Some overheard wisdom on admin sites (potentially wrong) is that you can somehow block by creating an htaccess statement that includes "curl" at the beginning. Perhaps like this bit from webmasterworld.com: RewriteCond %{HTTP_USER_AGENT} ^(curl¦Dart.?Communications¦Enfish¦htdig¦Java¦larbin) [NC,OR] - I don't speak htaccess - is this statement adaptable somehow?
55skidoo
55skidoo, as I said, I think blocking known user agents is a waste of time. This is a trivial configuration option for the spammer to change.
Matthew Flaschen
ok, sorry for misunderstanding. so there is no way to write an htaccess statement that essentially says "If the HTTP request begins with 'curl...' and the registration URL is included, then don't allow access."? This question probably betrays my ignorance of htaccess.
55skidoo
You can look for curl in the headers, but this is very unreliable. The spammer can (for example) make their request seem to be coming from Internet Explorer.
Matthew Flaschen
they are crafty, no doubt. How would I go about looking for curl in the headers?
55skidoo
You would check the User-Agent header. That is what the webmasterworld snippet is doing.
Matthew Flaschen
See this: curl_setopt($curl, CURLOPT_USERAGENT, "Mozilla/5.0 (compatible; MSIE 5.01; Windows NT 5.0)"); for example--one can set curl to "fake" any user agent one wants, so, as Mathew says, trying to look for curl in the headers really won't help anything.
Oren
OK, sounds like the consensus here is that blocking curl for a specific URL would not be very effective in stopping spammers from harvesting registration pages.
55skidoo
+1  A: 

There is one weakness in CURL, which you can exploit, it can not run javascript like a browser. So you can take advantage of this fact, one first landing on the reg page, have your server side code check for a cookie, if it isnt there, send some javascript code to the browser, this code will set the cookie and do a redirect/reload ... after reload the server side again checks for the cookie, incase of browsers it will find it.. incase of curl the cookie generation and reload/redirect wont happen in the first place.

I hope i made some sense, bottom line .. utilize javascript to differentiate between curl and browser.

Sabeen Malik