views:

542

answers:

10

With text-recognition improving and CAPTCHA-breakers using Mechanical Turks to break otherwise unbreakable keys, what's the next technology to keep scripts from spam-botting a site that relies on user input?

+3  A: 

I am a fan of limiting logins by using a credit card or cell phone SMS (like Craigslist and Gmail). These methods don't cost much (<$1), but can be highly effective in keeping spam accounts under control.

However, this is tricky on a site like SO because one of the founding goals is to have minimum friction and allow anonymous users to contribute. I guess that's where the throttling and voting comes into play.

Michael Haren
You give out your CC# just like that? Or your phone? Nice, free CC#'s and spam via txt. Not like there is an entire industry living off of those scams already.
Till
Of course you could use a trusted third party like Paypal, Google, etc.
Michael Haren
+6  A: 

I like the concept of an 'Invisible Captcha'. Phil Haack details one implementation here.

This banks on the fact that bots, spiders, and crawlers don't implement javascript engines. This too could change in the near future.

NerdFury
Sounds like something that could be circumvented should it gain widespread adoption. We want something that is fundamentally hard for computers to do.
WalloWizard
the fact that it is hard for a computer to do is not relevant. These things get farmed out to wage slaves in foreign countries to perform on a mass scale. In the case of the invisi-captcha, the bot thinks it was successful, but the server throws away the response. There is no input to farm out.
NerdFury
So spammers use bots with Javascript enabled. Using cheap labor of course is a work around for any CAPTCHA, but there will only be so much of that, and much less than the amount of spamming done right now.
WalloWizard
A: 

The most fundamental tool to keep people from spambotting a user input site is the "nofollow" tag on links. Most comment-spammers are interested in Google juice rather than actually having their stuff seen, so nofollow removes the incentive.

Chris Upchurch
Useless. They'll still do it - I've had nofollow on for over a year and I still get plenty of comment spam attempts. Their bots are indiscriminate and do no checking for nofollow.
ceejayoz
+5  A: 

Image recognition rather than text recognition.

WalloWizard
That can still be cracked with Mechanical Turk or porn.
+4  A: 

For now, reputation systems are harder to beat. The community sites of the near future will need to rely on its higher-ranking members to remove the spam.

The trend for spam is to become continually more indistinguishable from legitimate content, and for each new generation of mechanical filters to die of innefectiveness like overused antibiotics.

Even reputation systems will become useless as the spammers start maintaining sock-puppet farms to create their own high-ranking members, and when the community fights back the spammers will feed the churn of sock-puppets as if it was just another cost of doing business.

If you're going to build a site that takes user content, you'll either need to subscribe to the treadmill of neverending CAPTCHA-successors, or find a way to remove the incentive to spam your site in the first place.

C. Lawrence Wenham
+2  A: 

The bar will keep being raised with problems that computers are bad at and humans are good at. Something like recognising emotions in a human face is something humans are particularly good at.

Another option could be along the lines of differentiating between disgusting or nice. It's totally subjective, but humans tend to hate rotten food, open wounds, poo, etc.

Andrew Johnson
+1  A: 

Negative turing test. Have used this for over a year on WordPress, IP.Board and MediaWiki sites, and have absolutely zero spam. The only catch: you have to think of a question/answer combination that's neither common (otherwise, bots will adapt) nor too domain-specific (otherwise, potential users might not know the answer).

Sören Kuklau
+5  A: 
JeffV
A: 

Typically, for a site with resources of any value to protect, you need a 3-pronged approach:

  • Throttle responses from authenticated users only, disallow anonymous posts.
  • Minimize (not prevent) the few trash posts from authenticated users - e.g. reputation-based.
  • Use server-side heuristic logic to identify spam-like behavior, or better non-human-like behavior.

Of course, a human moderator can also help, but then you have other problems - namely, flooding (or even drowning) the moderator, and some sites prefer the openness...

AviD
+1  A: 

Robots are quite hard to defeat. On one website I was involved with, we didn't even use Captcha - just a field labelled "Leave this field blank". Robots always failed that really simple test.

The bigger problem is mass-human solving. There are lots of implementations whereby users solve screen-scraped captchas in return for something, like videos or images (you know what I mean). This means that there's a real human solving the captcha, so emotive, facial and more complex patterns are meaningless.

Multi-step processes will discourage this behaviour, but at the cost of making things harder for genuine visitors, which is sad when we're all trying to design websites that are more usable.

Sohnee