tags:

views:

309

answers:

6

Most sites at least employ server access log checking and banning along with some kind of bot prevention measure like a CAPTCHA (those messed-up text images).

The problem with CAPTCHAs is that they poss a threat to the user experience. Luckily they now come with user friendly features like refresh and audio versions.

Anyway, like linux vs windows, it isn't worth the time of a spammer to customize and/or build a script to handle a custom CAPTCHA example that only pertains to one site. Therefore, I was wondering if there might be better ways to handle the whole CAPTCHA thing.

In A Better CAPTCHA Peter Bromberg mentions that one way would be to convert the image to HTML and display it embedded in the page. On http://shiflett.org/ Chris simply asks users to type his name into an input. Examples like this are ways to simplifying the CAPTCHA experience while decreasing the value for spammers. Does anyone know of more good examples I could use or see any problem with the embedded image idea?

A: 

although not a true image captcha, good turing test is asking users a random question - common options are: is ice hot or cold? 5+2= ..? etc.

dusoft
The problem with this approach is that you'll need much more than thousands of such questions or the spammers will just build a "dictionary" of questions.
schnaader
not really. i see lots of these sites without any problem filtering out spam.
dusoft
Dinah
+7  A: 

Image presented as HTML table is just a technical speed bump. There's no difficulty in extraction of pixels from such document.

IMHO CAPTCHA puts focus on a wrong thing – you're not interested whether there's a human on the other side. You wouldn't like human to spam you either. So take a step back and focus on spam:

  • Analyze text (look for spammy keywords, use bayesian filtering)
  • Analyze links (blacklist spammy domains – SURBL, LinkSleeve)
  • Look at traffic patterns and block floods
  • There's no single perfectly accurate method, but you can use few of them and weight the result to get pretty close.

Have a look at source code of Sblam! (it's a completely transparent server-side comment spam filter).

porneL
yes, but human spam does not normally come in the flood fashion (unless it's from diggers)
Stefano Borini
And the human CAPTCHA solving market works just as well as bots...
Zifre
+1  A: 

Alternatives to captchas are going to be to consider the problem from other angles. The reason for this is because captchas are built around the idea that a human and computer actor can be distinguished. As Artificial intelligence progresses, this will always become an increasingly difficult problem as the gap between computer and human users shrinks.

The technique used here on slashdot is for other users of the site to act as gatekeepers, marking abuse and removing offending posts before they become noticeable to a wide audience.

Another technique is to detect spam-like posts directly, using the same technology used to filter spam from email. Obviously it isn't 100% effective for email, and wont be for other uses, either, but if you can filter out 75% of the spam with very few false positives being filtered, then other techniques will only have to deal with the remaining 25%.

Keep a log of spam-related activity, so that you can track trends about offending ip addresses, content of posts, claimed user agent, and so forth, so that you can block abusive users at a routing level.

In nearly all cases, your users would rather put up with the slight inconvenience of abuse prevention, than the huge inconvenience of a major spam problem.

Ultimately, the arms race between you and spammers is one of cost-benefit. Initially, it will cost spammers close to nothing to spam your site, but you can change that to make it very difficult. Even if they continue to spam your site, the benefit they recieve will never grow beyond a few innocent users falling for their schemes. Once the cost of spamming rises sharply above the benefit, the spammers will go away.

Another way to benefit from that is to allow advertising on your site. Make it inexpensive (but not free, of course) and easy for legitimate advertisers to post responsible marketing material for your users to see. Would be spammers may find that it is a better deal to just pay you a few dollars and get their offering seen than to pursue clandestine methods.

Obviously most spammers won't fit in this category, since that is often more about getting your users to fall victim to malware exploits. You can do your part for that by encouring users to use modern, up to date browsers or plugins so that they become less vulnerable to those same exploits.

TokenMacGuy
A: 

I really think that Dinah hit the nail on the head. The fact seems to be that the beauty of the whole CAPTCHA setup is that there is no standard. Standardizing would only help the market to be more profitable.

Therefore it seems that the best way to handle the CAPTCHA problem is to come up with a fairly hard system for bots to catch that is NOT used by anyone else on the planet. It could be a question system, a very custom image creator, or even a mix of JS calls that only browsers respect.

By the time that your site is big enough for spammers to care you should have the budget to rethink your CAPTCHA setup and optimize it much more. In the mean time we should be monitoring our server logs and banning bad agents, refers, and IP's.

In my case I created a CAPTCHA image that I believe is very different from any other CAPTCHA I have seen. This should do fine for now along side my Apache logs + htaccess banning and Aksimet checking. Maybe I should spend time on a reporting feature as well.

Xeoncross
+2  A: 

This article describes a technique based on hashed field names (changing with each page view) with some of them being honeypot fields (i.e. the request is rejected if they're filled) that are hidden from human users via various techniques.

Basically, it relies on spam scripts not being sophisticated enough to determine which form fields are actually visible. In a way, that is a CAPTCHA, since in order to solve it reliably, not only would they have to implement HTML, CSS and JavaScript fully, they'd also have to recognize when a field is too small to see, colored the same as the background, hidden behind another field, placed outside the browser's viewport, etc.

It's the same basic problem that makes Web Standards a farce: there is no algorithm to determine whether a webpage "looks right" - only a human can decide that.

Michael Borgwardt
Yes, that is another great idea. There are always ways to solve a problem like this without needing the users input. I remember one that used JS to add an extra form token on submit that only bots which use JS would have sent.
Xeoncross
+1  A: 

seen this?

It's a system with cute pictures instead of captcha ;)

But I still think honeypots are a better solution - they're so cheap&easy&invisible

naugtur