When the bots attack!

views:

1720

answers:

+20 Q:

When the bots attack!

What are some popular spam prevention methods besides CAPTCHA?

+1 A:

Javascript evaluation techniques like this Invisible Captcha system require the browser to evaluate Javascript before the page submission will be accepted. It falls back nicely when the user doesn't have Javascript enabled by just displaying a conventional CAPTCHA test.

Jon Galloway 2008-09-21 18:15:29

There are a bunch of bots out there now with script execution capabilities. I think the days of using script as a gatekeeper are numbered.

stephbu 2008-09-21 18:18:10

Yeah, they soon will be executing JS faster than most browsers, thanks Google :)

Ilya Ryzhenkov 2008-09-21 19:38:21

Indeed JavaScript is a sort of poor gatekeeper.

Till 2008-09-21 22:17:52

+9 A:

Give the user the possibility to calculate:

What is the sum of 3 and 8?

By the way: Just surfed by an interesting approach of Microsoft Research: Asirra.

http://research.microsoft.com/asirra/

It shows you several pictures and you have to identify the pictures with a given motif.

Johannes Hädrich 2008-09-21 18:15:30

I have used this one to great effect in the past as well. What is 2+2? Anything that is custom, easy for users, and easy for you is the best solution.

Jason Short 2008-09-23 02:58:30

CAPTCHA means automated turing test, and asking humans quesions falls under that definition.

porneL 2008-10-19 11:50:43

I checked out Aksimet, found it to be very interesting, and a great idea since animal recognition is something bots are horrible at. I'd upvote 5 times if I could based on this alone :D

David Frenkel 2008-11-13 20:13:40

Great Idea from MS

Jim C 2008-11-14 19:04:33

if onley i'd bought $20 worth of MS in 95 :/

divinci 2009-06-07 00:55:36

I've used Asirra on my site, and it has worked fantastically. I recommend this to anybody. Also, it's not frustrating like CAPTCHA is. It's fun for the user and helps pets get adopted. Win for me, win for my users, win for abandoned pets.

Eric 2009-06-18 16:17:56

+2 A:

The most common ones I've observed orient around user input to solve simple puzzles e.g. of the following is a picture of a cat. (displaying pictures of thumbnails of dogs surrounding a cat). Or simple math problems.

While interesting I'm sure the arms race will also overwhelm those systems too.

stephbu 2008-09-21 18:16:59

Invisble form fields. Make a form field that doesn't appear on the screen to the user. using display: none as a css style so that it doesn't show up. For accessibility's sake, you could even put hidden text so that people using screen readers would know not to fill it in. Bots almost always fill in all fields, so you could block any post that filled in the invisible field.

Kibbee 2008-09-21 18:18:19

interesting! do you have experience with this method in production use?

Johannes Hädrich 2008-09-21 18:20:48

It wasn't very successful the one time I tried it. The website is still live employing that technique and no matter what variation of it I do the bots get through. Maybe I was just targeted by the smarter ones =/

Paolo Bergantino 2008-09-21 18:44:11

+18 A:

I have tried doing 'honeypots' where you put a field and then hide it with CSS (marking it as 'leave blank' for anyone with stylesheets disabled) but I have found that a lot of bots are able to get past it very quickly. There are also techniques like setting fields to a certain value and changing them with JS, calculating times between load time and submit time, checking the referer URL, and a million other things. They all have their pitfalls and pretty much all you can hope for is to filter as much as you can with them while not alienating who you're here for: the users.

At the end of the day, though, if you really, really, don't want bots to be sending things through your form you're going to want to put a CAPTCHA on it - best one I've seen that takes care of mostly everything is reCAPTCHA - but thanks to India's CAPTCHA solving market and the ingenuity of spammers everywhere that's not even successful all of the time. I would beware using something that is 'ingenious' but kind of 'out there' as it would be more of a 'wtf' for users that are at least somewhat used to your usual CAPTCHAs.

Paolo Bergantino 2008-09-21 18:18:53

I like the CSS technique. It works very well across the board. I'd also vote for this answer, but I have no votes left! :D

Till 2008-09-21 22:17:11

Actually, this is trivially bypassable by the simplest of techniques; if a spammer wants to misuse your site, he's gonna. Only thing protecting you is if you're not big enouhg to bother with looking at your code.

AviD 2008-09-23 04:45:31

very interesting

Jose Vega 2008-10-01 03:52:37

+3 A:

You can use Recaptcha to at least make a captcha useful. Then you can make questions with simple verbal math problems or similar. Microsoft's Asirra makes you find pics of cats and dogs. Requiring a valid email address to activate an account stops spammers when they wouldn't get enough benefit from the service, but might deter normal users as well.

jjrv 2008-09-21 18:19:13

Block access based on a blacklist of spammers IP addresses.

Chris 2008-09-21 18:19:33

This technique does not work at all and blacklists will usually contain mostly IPs of legitimate users. Almost anyone you succeed in blocking this way will be a legitimate user.

MarkR 2008-09-21 18:21:19

"This technique does not work at all" seems a bit of an overzealous claim to me. Blocking by IP address can root out some of the most serious offenders. I was not suggesting deny access to large blocks of addresses.

Chris 2008-09-21 18:31:51

IP blacklists need to be incrementally temporary, meaning it starts off temporary (to prevent blocking legitimate users), and each repetition it should stay on the blacklist longer.

AviD 2008-09-21 20:46:18

Honeypot techniques put an invisible decoy form at the top of the page. Users don't see it and submit the correct form, bots submit the wrong form which does nothing or bans their IP.

Jon Galloway 2008-09-21 18:20:28

+1 A:

Honeypots are one effective method. Phil Haack gives one good honeypot method, that could be used in principle for any forum/blog/etc.

You could also write a crawler that follows spam links and analyzes their page to see if it's a genuine link or not. The most obvious would be pages with an exact copy of your content, but you could pick out other indicators.

Moderation and blacklisting, especially with plugins like these ones for WordPress (or whatever you're using, similar software is available for most platforms), will work in a low-volume environment. If your environment is a low volume one, don't underestimate the advantage this gives you. Personally deciding what is reasonable content and what isn't gives you ultimate flexibility in spam control, if you have the time.

Don't forget, as others have pointed out, that CAPTCHAs are not limited to text recognition from an image. Visual association, math problems, and other non-subjective questions relayed through an image also qualify.

Dustman 2008-09-21 18:22:06

+1 A:

Animated captchas' - scrolling text - still easy to recognize by humans but if you make sure that none of the frames offer something complete to recognize.

multiple choice question - All it takes is a __ and a smile. idea here is that the user will have to choose/understand.

session variable - checking that a variable you put into a session is part of the request. will foil the dumb bots that simply generate requests but probably not the bots that are modeled like a browser.

math question - 2 + 5 = - this again is to ask a question that is easy to solve but prevents the bots ability to generate a response.

image grid - you create grid of images - select 1 or 2 of a particular type such as 3x3 grid picture of animals and you have to pick out all the birds on the grid.

Hope this gives you some ideas for your new solution.

MikeJ 2008-09-21 18:41:32

I see math questions getting more and more common, but they don't really seem like the best choice. When they become prevalent enough to annoy botters, it will probably be very simple to add a simple function to parse the trivial arithmetic that most of these questions are.

Jeremy Banks 2008-09-21 22:18:49

A single multiple choice question with N options isn't very effective since it will still allow a random guess to succeed 1 / N of the time. The beauty of "pick out all the birds in the 3x3 grid" is that 1 / 2^9 << 1 / 4 (wlog taking 4 as the usual number of answers to a multiple choice question).

Doug McClean 2009-06-18 16:08:16

+4 A:

http://chongqed.org/ maintains blacklists of active spam sources and the URLs being advertised in the spams. I have found filtering posts for the latter to be very effective in forums.

moonshadow 2008-09-21 18:55:43

+1 A:

A friend has the simplest anti-spam method, and it works.

He has a custom text box which says "please type in the number 4".

His blog is rather popular, but still not popular enough for bots to figure it out (yet).

ripper234 2008-09-21 19:06:24

I hope that number was randomly chosen! http://xkcd.com/221/

Aidan 2008-09-21 21:11:10

Aidan, I had the same thought!

epochwolf 2008-10-01 03:10:27

+6 A:

A very simple method which puts no load on the user is just to disable the submit button for a second after the page has been loaded. I used it on a public forum which had continuous spam posts, and it stopped them since.

2008-09-21 19:59:10

Pretty interesting suggestion, i might try that out. Thanks for your answer.

Jose Vega 2008-09-21 22:34:28

I don't get it. Do the bots care about whether the submit button is enabled?

Seun Osewa 2009-02-19 00:31:00

I guess they click it while it's disabled, revealing the fact that they don't respect the timer.

Reef 2009-12-21 16:25:18

-1 - bots almost always ignore javascript.

Lotus Notes 2010-05-21 18:46:14

+15 A:

Shocking, but almost every response here included some form of CAPTCHA. The OP wanted something different, I guess maybe he wanted something that actually works, and maybe even solves the real problem.
CAPTCHA doesn't work, and even if it did - its the wrong problem - humans can still flood your system, and by definition CAPTCHA wont stop that (cuz its designed only to tell if you're a human or not - not that it does that well...)

So, what other solutions are there? Well, it depends... on your system and your needs. For instance, if all you're trying to do is limit how many times a user can fill out a "Contact Me" form, you can simply throttle how many requests each user can submit per hour/day/whatever. If your users are anonymous, maybe you need to throttle according to IP addresses, and occasionally blacklist an IP (though this too can be circumvented, and causes other problems).
If you're referring to a forum or blog comments (such as this one), well the more I use it the more I like the solution. A mix between authenticated users, authorization (based on reputation, not likely to be accumulated through flooding), throttling (how many you can do a day), the occasional CAPTCHA, and finally community moderation to cleanup the few that get through - all combine to provide a decent solution. (I wonder if Jeff can provide some info on how much spam and other malposts actually get through...?)

Another control to consider (dont know if they have it here), is some form of IDS/IPS - if you can detect and recognize spam, you can block THAT pattern. Moderation fills that need manually, here...

Note that any one of these does not prevent the spam, but incrementally lowers the probability, and thus the profitability. This changes the economic equation, and leaves CAPTCHA to actually provide enough value to be worth it - since its no longer worth it for the spammers to bother breaking it or going around it (thanks to the other controls).

AviD 2008-09-21 20:42:51

+1 A:

Please remember to make your solution accessible to those not using conventional browsers. The iPhone crowd are not to be ignored, and those with vision and cognitive problems should not be excluded either.

Aidan 2008-09-21 21:14:13

+8 A:

Try Akismet

Captchas or any form of human-only questions are horrible from a usability perspective. Sometimes they're necessary, but I prefer to kill spam using filters like Akismet.

Akismet was originally built to thwart spam comments on WordPress blogs, but the API is capabable of being adapted for other uses.

Update: We've started using the ruby library Rakismet on our Rails app, Yarp.com. So far, it's been working great to thwart the spam bots.

Ryan McGeary 2008-09-21 21:33:01

I'd be interested to know just what they base it on, but I suppose it's mostly common patterns liek advertising combined with URL link

David Frenkel 2008-11-13 20:15:20

I think they use a number of inputs, including content against a bayesian-like filter, URLs, source IP addresses, etc.

Ryan McGeary 2008-11-14 18:54:42

I've seen a few neat ideas along the lines of Asira which ask you to identify which pictures are cats. I believe the idea originated from KittenAuth a while ago..

Jon Cage 2008-09-23 02:35:26

You could get some device ID software the41 has some fraud prevention software that can detect the hardware being used to access your site. I belive they use it to catch fraudsters but could be used to stop bots. Once you have identified an device being used by a bot you can just block that device. Last time a checked it can even trace your route throught he phone network ( Not your Geo-IP !! ) so can even block a post code if you want.

Its expensive through so prop. a better cheaper solution that is a little less big brother.

2008-09-23 13:36:14

Use something like the google image labeler with appropriately chosen images such that a computer wouldn't be able to recognise the dominant features of it that a human could.

The user would be shown an image and would have to type words associated with it. They would keep being shown images until they have typed enough words that agreed with what previous users had typed for the same image. Some images would be new ones that they weren't being tested against, but were included to record what words are associated with them. Depending on your audience you could also possibly choose images that only they would recognise.

Sam Hasler 2008-09-24 01:57:08

+1 A:

The following is unfeasible with today's technology, but I don't think it's too far off. It's also probably overkill for dealing with forum spam, but could be useful for account sign-ups, or any situation where you wanted to be really sure you were dealing with humans and they would be prepared for it to take a few minutes to complete the process.

Have 2 users who are trying to prove themselves human connect to each other via their webcams and ask them if the person they are seeing is human and live (i.e. not a recording), by getting them to, for example, mirror each other's movements, or write something on a piece of paper. Get everyone to do this a few times with different users, and throw a few recordings into the mix which they also have to identify correctly as such.

Sam Hasler 2008-09-24 01:57:45

+1 for creativity and thinking outside the box =) ...beware of the 'think of the children' crowd though, if you implement it...

David Thomas 2009-06-18 17:03:44

+2 A:

A popular method on forums is to simply queue the threads of members with less than 10 posts in a moderation queue. Of course, this doesn't help if you don't have moderators, or it's not a forum. A more general method is the calculation of hyperlink to text ratios. Often, spam posts contain a ton of hyperlinks, and you can catch a lot this way. In the same vein is comparing the content of consecutive posts. Simply do not allow consecutive posts that are extremely similar.

Of course, anyone with knowledge of the measures you take is going to be able to get around them. To be honest, there is little you can do if you are the target of a specific attack. Rather, you should focus on preventing more general, unskilled attacks.

Tim Sally 2008-09-24 02:04:35

+5 A:

Ned Batchelder wrote up a technique that combines hashes with honeypots for some wickedly effective bot-prevention. No captchas, just code.

It's up at Stopping spambots with hashes and honeypots:

Rather than stopping bots by having people identify themselves, we can stop the bots by making it difficult for them to make a successful post, or by having them inadvertently identify themselves as bots. This removes the burden from people, and leaves the comment form free of visible anti-spam measures.

This technique is how I prevent spambots on this site. It works. The method described here doesn't look at the content at all. It can be augmented with content-based prevention such as Akismet, but I find it works very well all by itself.

joemurphy 2008-10-01 02:58:34

Mollom is supposedly good at stopping spam. Both personal (free) and professional versions are available.

Pieter 2008-11-07 22:56:31

I know some people mentioned ASIRRA, but if you go to all the adopt me links for the images, it will say on that linked page if its a cat or dog. So it should be relatively easy for a bot to just go to all the adoptme links. So its just a matter of time for that project.

tooleb 2009-01-16 16:17:15

just verify the email address and let google/yahoo etc worry about it

MatthewFord 2009-06-18 15:55:22

+1 A:

Sblam is an interesting project.

Michal M 2009-06-18 16:05:04

+1 A:

For human moderators it surely helps to be able to easily find and delete all posts from some IP, or all posts from some user if the bot is smart enough to use a registered account. Likewise the option to easily block IP addresses or accounts for some time, without further administration, will lessen the administrative burden for human moderators.

Using cookies to make bots and human spammers believe that their post is actually visible (while only they themselves see it) prevents them (or trolls) from changing techniques. Let the spammers and trolls see the other spam and troll messages.

Arjan 2009-06-18 16:12:11

ansaurus

tags:

views:

answers:

When the bots attack!

Try Akismet

related questions