Someone also suggest the Raphael JavaScript library, which apparently let you draw on the client in all popular browsers:
http://dmitry.baranovskiy.com/raphael/
.. but that wouldn't exactly work with my <noscript>
case, now would it ? :)
Someone also suggest the Raphael JavaScript library, which apparently let you draw on the client in all popular browsers:
http://dmitry.baranovskiy.com/raphael/
.. but that wouldn't exactly work with my <noscript>
case, now would it ? :)
ASCII text isn't much more legible than the really screwy image captchas that are around these days.
I think the math puzzle would be a good fit here, since we're all supposed to be fairly math-oriented. Just don't ask me to do integration, please.
I like the word math problems. It would be interesting to try it out (at least it's easy to do) and see how the baddies respond.
Be sure it isn't something Google can answer though. Which also shows an issue with that --order of operations!
Although we all should know basic maths, the math puzzle could cause some confusion. In your example I'm sure some people would answer with "8" instead of "1".
Would a simple string of text with random characters highlighted in bold or italics be suitable? The user just needs to enter the bold/italic letters as the CAPTCHA.
E.g. *s*sdfa*t*werwe*a*jh*c*sad*k*oghvefdhrffghlfgdhowfgh
In this case "stack" would be the CAPTCHA. There are obviously numerous variations on this idea.
Edit: Example variations to address some of the potential problems identified with this idea:
The advantage of this approach is that, for most people, the CAPTCHA won't ever be visible!
I like this idea, is there not any way we can just hook into the rep system? I mean, anyone with say +100 rep is likely to be a human. So if they have rep, you need not even bother doing ANYTHING in terms of CAPTCHA.
Then, if they are not, then send it, I'm sure it wont take that many posts to get to 100 and the community will instantly dive on anyone seem to be spamming with offensive tags, why not add a "report spam" link that downmods by 200? Get 3 of those, spambot achievement unlocked, bye bye ;)
EDIT: I should also add, I like the math idea for the non-image CAPTCHA. Or perhaps a simple riddle-type-thing. May make posting even more interesting ^_^
I mean, anyone with say +100 rep is likely to be a human. So if they have rep, you need not even bother doing ANYTHING in terms of CAPTCHA
Yeah, that's what I used to think, too. Note number of revisions on that post and their source. Hi Kevin!
So, CAPTCHA is mandatory for all users except moderators.
Would a simple string of text with random characters highlighted in bold or italics be suitable? The user just needs to enter the bold/italic letters as the captcha.
eg. ssdfatwerweajhcsadkoghvefdhrffghlfgdhowfgh
@Jared - I can barely pick out the bold letters in that string even when I'm trying. Maybe if we made the font HUGE. usability--;
@pc1oad1etter I also noticed that after doing my post. However, it's just an idea and not the actual implementation. Varying the font or using different colours instead of bold/italics would easily address usability issues.
Who says you have to create all the images on the server with each request? Maybe you could have a static list of images or pull them from flickr. I like the "click on the kitten" captcha idea. http://www.thepcspy.com/kittenauth
@lance
Who says you have to create all the images on the server with each request? Maybe you could have a static list of images or pull them from Flickr. I like the "click on the kitten" CAPTCHA idea. http://www.thepcspy.com/kittenauth.
If you pull from a static list of images, it becomes trivial to circumvent the CAPTCHA, because a human can classify them and then the bot would be able to answer the challenges easily. Even if a bot can't answer all of them, it can still spam. It only needs to be able to answer a small percent of CAPTCHAs, because it can always just retry when an attempt fails.
This is actually a problem with puzzles and such, too, because it's extremely difficult to have a large set of challenges.
@rob
What about a honeypot captcha? Wow, so simple! Looks good! Although they have highlighted the accessibility issue.. Do you think that this would be a problem at SO? I personally find it hard to imagine developers/programmers that have difficulty reading the screen to the point where they need a screen reader?
There are developers who are not just legally blind, but 100% blind. Walking cane and helper dog. I hope the site will support them in a reasonable fashion.
However, with the honeypot captcha, you can put a hidden div as well that tells them to leave the field blank. And you can also put it in the error message if they do fill it in, so I'm not sure how much of an issue accessibility really is here. It's definitely not great, but it could be worse.
I had a load of spam issues on a phpBB 2.0 site I was running a while back (the site is now upgraded).
I installed a custom captcha mod I found on the pbpBB forums that worked well for a period of time. I found the real solution was combining this with additional 'required' fields [on the account creation page].
I added; Location and Occupation (mundane, yet handy to know).
The bot never tried to fill these in, still assuming the captcha was the point of fail for each attempt.
Answering the original question:
I've seen pictures of animals [what is it?]. Votes for comics use a picture of a character with their name written somewhere in the image [type in name]. Impossible to parse, not ok for blind people.
You could have an audio fallback reading alphanumerics (the same letters and numbers you have in the captcha).
Final line of defense: make spam easy to report (one click) and easy to delete (one recap screen to check it's a spam account, with the last ten messages displayed, one click to delete account). This is still time-expensive, though.
A method that I have developed and which seems to work perfectly (although I probably don't get as much comment spam as you), is to have a hidden field and fill it with a bogus value e.g.:
<input type="hidden" name="antispam" value="lalalala" />
I then have a piece of JavaScript which updates the value every second with the number of seconds the page has been loaded for:
var antiSpam = function() {
if (document.getElementById("antiSpam")) {
a = document.getElementById("antiSpam");
if (isNaN(a.value) == true) {
a.value = 0;
} else {
a.value = parseInt(a.value) + 1;
}
}
setTimeout("antiSpam()", 1000);
}
antiSpam();
Then when the form is submitted, If the antispam value is still "lalalala", then I mark it as spam. If the antispam value is an integer, I check to see if it is above something like 10 (seconds). If it's below 10, I mark it as spam, if it's 10 or more, I let it through.
If AntiSpam = A Integer
If AntiSpam >= 10
Comment = Approved
Else
Comment = Spam
Else
Comment = Spam
The theory being that:
The downside to this method is that it requires JavaScript, and if you don't have JavaScript enabled, your comment will be marked as spam, however, I do review comments marked as spam, so this is not a problem.
Response to comments
@MrAnalogy: The server side approach sounds quite a good idea and is exactly the same as doing it in JavaScript. Good Call.
@AviD: I'm aware that this method is prone to direct attacks as I've mentioned on my blog. However, it will defend against your average spam bot which blindly submits rubbish to any form it can find.
There was a CAPTCHA you talked about in your blog where you had to identify pictures of dogs or cats. That one has always been memorable to me.
Good idea, but now that I know how it works I could just set the value of "antispam" to >= 10 when forging a POST request.
Most of the ideas here work great against spam bots but fail hard against attacks. I haven't even tried this, but I doubt there is flood protection; I'm sure someone could write a script to ask a new question every 30 seconds or so.
CAPTCHA is pointless, the best solution is:
Although this similar discussion was started:
We are trying this solution on one of our frequently data mined applications:
A Better CAPTCHA Control (Look Ma - NO IMAGE!)
You can see it in action on our Building Inspections Search.
You can view Source and see that the CAPTCHA is just HTML.
How about showing nine random geometric shapes, and asking the user to select the two squares, or two circles or something.. should be pretty easy to write, and easy to use as well..
There's nothing worse than having text you cannot read properly...
Have you looked at Waegis?
"Waegis is an online web service that exposes an open API (Application Programming Interface). It gets incoming data through its API methods and applies a quick check and identifies spam and legitimate content on time. It then returns a result to client to specify if the content is spam or not."
Without an actual CAPTCHA as your first line of defense, aren't you still vulnerable to spammers scripting the browser (trivial using VB and IE)? I.e. load the page, navigate the DOM, click the submit button, repeat...
So, CAPTCHA is mandatory for all users except moderators. [1]
That's incredibly stupid. So there will be users who can edit any post on the site but not post without CAPTCHA? If you have enough rep to downvote posts, you have enough rep to post without CAPTCHA. Make it higher if you have to. Plus there are plenty of spam detection methods you can employ without image recognition, so that it even for unregistered users it would never be necessary to fill out those god-forsaken CAPTCHA forms.
I think they are working on throttling. It would make more sense just to disable CAPTCHA for users with 500+ rep and reset the rep for attackers.
I recently (can't remember where) saw a system that showed a bunch of pictures. Each of the pictures had a character assigned to it. The user was then asked to type in the characters for some pictures that showed examples of some category (cars, computers, buildings, flowers and so on). The pictures and characters changed each time as well as the categories to build the CAPTCHA string.
The only problem is the higher bandwidth associated with this approach and you need a lot of pictures that are classified in categories. There is no need to waste much resources generating the pictures.
One option would be out-of-band communication; the server could send the user an instant message (or SMS message?) that he/she then has to type into the captcha field.
This imparts an "either/or" requirement on the user -- either you must enable JavaScript OR you must be logged on to your IM service of choice. While it maybe isn't as flexible as some of the other solutions above, it would work for the vast majority of users.
Those with edit privileges, feel free to add to the Pros/Cons rather than submitting a separate reply.
Pros:
Cons:
My solution was to put the form on a separate page and pass a timestamp to it. On that page I only display the form if the timestamp is valid (not too fast, not too old). I found that bots would always hit the submission page directly and only humans would navigate there correctly.
Won't work if you have the form on the content page itself like you do now, but you could show/hide the link to the special submission page based on NoScript. A minor inconvienience for such a small percentage of users.
Unless I'm missing something, what's wrong with using reCAPTCHA as all the work is done externally.
Just a thought.
Best captcha ever! Maybe you need something like this for sign-up to keep the riff-raff out.
My suggestion would be an ASCII captcha it does not use an image, and it's programmer/geeky. Here is a PHP implementation http://thephppro.com/products/captcha/ this one is a paid. There is a free, also PHP implementation, however I could not find an example -> http://www.phpclasses.org/browse/package/4544.html
I know these are in PHP but I'm sure you smart guys building SO can 'port' it to your favorite language.
I just use simple questions that anyone can answer:
What color is the sky?
What color is an orange?
What color is grass?
It makes it so that someone has to custom program a bot to your site, which probably isn't worth the effort. If they do, you just change the questions.
What if you used a combination of the captcha ideas you had (choose any of them - or select one of them randomly):
with the addition of placing the exact same captcha in a css hidden section of the page - the honeypot idea. That way, you'd have one place where you'd expect the correct answer and another where the answer should be unchanged.
If you're leaning towards the question/answer solution in the past I've presented users with a dropdown of 3-5 random questions that they could choose from and then answer to prove they were human. The list was sorted differently on each page load.
Avoid the worst CAPTCHAs of all time.
Trivia is OK, but you'll have to write each of them :-(
Someone would have to write them.
You could do trivia questions in the same way ReCaptcha does printed words. It offers two words, one of which it knows the answer to, another which it doesn't - after enough answers on the second, it now knows the answer to that too. Ask two trivia questions:
A woman needs a man like a fish needs a?
Orange orange orange. Type green.
Of course, this may need to be coupled with other techniques, such as timers or computed secrets. Questions would need to be rotated/retired, so to keep the supply of questions up you could ad-hoc add:
Enter your obvious question:
You don't even need an answer; other humans will figure that out for you. You may have to allow flagging questions as "too hard", like this one: "asdf ejflf asl;jf ei;fil;asfas".
Now, to slow someone who's running a StackOverflow gaming bot, you'd rotate the questions by IP address - so the same IP address doesn't get the same question until all the questions are exhausted. This slows building a dictionary of known questions, forcing the human owner of the bots to answer all of your trivia questions.
Unless I'm missing something, whats wrong with using reCAPTCHA as all the work is done externally.
RTFQ:
However, for people with JavaScript disabled, we still need a fallback -- and this is where it gets tricky.
reCAPTCHA uses JavaScript. Thus: problem.
Even with rep, there should still be SOME type of capcha, to prevent a malicious script attack.
Very simple arithmetic is good. Blind people will be able to answer. (But as Jarod said, beware of operator precedence.) I gather someone could write a parser, but it makes the spamming more costly.
Sufficiently simple, and it will be not difficult to code around it. I see two threats here:
With simple arithmetics, you might beat off threat #1, but not threat #2.
I wrote up a PHP class that lets you choose to use a certain class of Captcha Question (math, naming, opposites, completion), or to randomize which type is used. These are questions that most english-speaking children could answer. For example:
Our form spam has been drastically cut after implementing the honeypot captcha method as mentioned previously. I believe we haven't received any since implementing it.
Do you ever plan to provide an API for Stackoverflow that would allow manipulation of questions/answers programmatically? If so, how is CAPTCHA based protection going to fit into this?
While providing just a rich read-only interface via Atom syndication feeds would allow people to create some interesting smart-clients/tools for organizing and searching the vast content that is Stackoverflow; I could see having the capability outside of the web interface to ask and/or answer questions as well as vote on content as extremely useful. (Although this may not be in line with an ad-based revenue model.)
I would prefer to see Stackoverflow use a heuristic monitoring approach that attempts to detect malicious activity and block the offending user, but can understand how using CAPTCHA may be a simpler approach with your release data coming up soon.
Perhaps the community can come up with some good text-based CAPTCHAs?
We can then come up with a good list based on those with the most votes.
The list of answers were overwhelming!
But finding in page, haven't seen anyone mention "Bad Behavior" yet. It's a plugin for most blogging systems that detects bots based on some bad behavior, you might want to check that out.
This will be per-sign-up and not per-post, right? Because that would just kill the site, even with jQuery automation.
Use a simple text CAPTCHA and then ask the users to enter the answer backwards or only the first letter, or the last, or another random thing.
Another idea is to make a ASCII image, like this (from Portal game end sequence):
.,---.
,/XM#MMMX;,
-%##########M%,
-@######% $###@=
.,--, -H#######$ $###M:
,;$M###MMX; .;##########$;HM###X=
,/@##########H= ;################+
-+#############M/, %##############+
%M###############= /##############:
H################ .M#############;.
@###############M ,@###########M:.
X################, -$=X#######@:
/@##################%- +######$-
.;##################X .X#####+,
.;H################/ -X####+.
,;X##############, .MM/
,:+$H@M#######M#$- .$$=
.,-=;+$@###X: ;/=.
.,/X$; .::,
., ..
And give the user some options like: IS A, LIE, BROKEN HEART, CAKE.
How about just checking to see if JavaScript is enabled?
Anyone using this site is surely going to have it enabled. And from what folks say, the Spambots won't have JavaScript enabled.
I've had amazingly good results with a simple "Leave this field blank:" field. Bots seem to fill in everything, particularly if you name the field something like "URL". Combined with strict referrer checking, I've not had a bot get past it yet.
Please don't forget about accessibility here. Captchas are notoriously unusable for many people using screen readers. Simple math problems, or very trivial trivia (I liked the "what color is the sky" question) are much more friendly to vision-impaired users.
CAPTCHAs check if you are human or computer. The problem is that after that a computer needs to judge whether you are human.
So a solution would be to let one user fill out a CAPTCHA and let the next user check it. The problem is of course the time gap.
I think we must assume that this site will be subject to targeted attacks on a regular basis, not just generic drifting bots. If it becomes the first hit for programmers' searches, it will draw a lot of fire.
To me, that means that any CAPTCHA system cannot pull from a repeating list of questions, which a human can manually feed into a bot, in addition to being unguessable by bots.
If you want an ASCII-based approach, take a look at integrating FIGlet. You could make some custom fonts and do some font selection randomization per character to increase the entrophy. The kerning makes the text more visually pleasing and a bit harder for a bot to reverse engineer.
Such as:
______ __ ____ _____ / __/ /____ _____/ /__ / __ \_ _____ ____/ _/ /__ _ __ _\ \/ __/ _ `/ __/ '_/ / /_/ / |/ / -_) __/ _/ / _ \ |/|/ / /___/\__/\_,_/\__/_/\_\ \____/|___/\__/_/ /_//_/\___/__,__/
I have to admit that I have no experience fighting spambots and don't really know how sophisticated they are. That said, I don't see anything in the jQuery article that couldn't be accomplished purely on the server.
To rephrase the summary from the jQuery article:
Another option, if you want to use the traditional image CAPTCHA without the overhead of generating them on every request is to pre-generate them offline. Then you just need to randomly choose one to display with each form.
KP's suggestion of the below CAPTCHA is very clever and imageless...
I'd vote for this!
I've been using the following simple technique, it's not foolproof. If someone really wants to bypass this, it's easy to look at the source (i.e. not suitable for the Google CAPTCHA) but it should fool most bots.
Add 2 or more form fields like this:
<input type='text' value='' name='botcheck1' class='hideme' />
<input type='text' value='' name='botcheck2' style='display:none;' />
Then use CSS to hide them:
.hideme {
display: none;
}
On submit check to see if those form fields have any data in them, if they do fail the form post. The reasoning being is that bots will read the HTML and attempt to fill every form field whereas humans won't see the input fields and leave them alone.
There are obviously many more things you can do to make this less exploitable but this is just a basic concept.
CAPTCHA, in its current conceptualization, is broken and often easily bypassed. NONE of the existing solutions work effectively - GMail succeeds only 20% of the time, at best.
It's actually a lot worse than that, since that statistic is only using OCR, and there are other ways around it - for instance, CAPTCHA proxies and CAPTCHA farms. I recently gave a talk on the subject at OWASP, but the ppt is not online yet...
While CAPTCHA cannot provide actual protection in any form, it may be enough for your needs, if what you want is to block casual drive-by trash. But it won't stop even semi-professional spammers.
Typically, for a site with resources of any value to protect, you need a 3-pronged approach:
CAPTCHA can help a TINY bit with the second prong, simply because it changes the economics - if the other prongs are in place, it no longer becomes worthwhile to bother breaking through the CAPTCHA (minimal cost, but still a cost) to succeed in such a small amount of spam.
Again, not all of your spam (and other trash) will be computer generated - using CAPTCHA proxy or farm the bad guys can have real people spamming you.
CAPTCHA proxy is when they serve your image to users of other sites, e.g. porn, games, etc.
A CAPTCHA farm has many cheap laborers (India, far east, etc) solving them... typically between 2-4$ per 1000 captchas solved. Recently saw a posting for this on Ebay...
When registering for a new hosting, I was called by a hosting compony bot (to my mobile phone) and it spelled three digits. I had to enter those digit to finish registration. This way also decent antiscam protection is provided.
Simple Weiqi problems to solve (to comment in a Russian Weiqi blog weiqi.ru/news):
http://www.picamatic.com/view/1139255_weiqi-captcha/
This is an image-based CAPTCHA though.
One way I know of to weed out bots is to store a key in the user's cookie and if the key or cookie doesn't existing assume they're a bot and ignore them or fall back in image CAPTCHA. It's also a really good way of preventing a bunch of sessions/tracking being created for bots that can add a lot of noise to your DB or overhead to your system performance.
One thing that is baffling is how Google, apparently the company with the most CS PHDs in the world can have their Captcha broken, and seem to do nothing about it.
Post a math problem as an IMAGE, probably with paranthesis for clarity.
Just clearly visible text in an image.
(2+5)*2
Not the most refined anti-spam weapon, but hey, Microsoft endorsed:
Nobot-Control (part of AjaxControlToolkit).
NoBot can be tested by violating any of the above techniques: posting back quickly, posting back many times, or disabling JavaScript in the browser.
Demo:
http://www.asp.net/AJAX/AjaxControlToolkit/Samples/NoBot/NoBot.aspx
I saw this once on a friend's site. He is selling it for 20 bucks. It's ASCII art!
http://thephppro.com/products/captcha/
.oooooo. oooooooo
d8P' `Y8b dP"""""""
888 888 d88888b.
888 888 V `Y88b '
888 888 ]88
`88b d88' o. .88P
`Y8bood8P' `8bd88P'
You don't only want humans posting. You want humans that can discuss programming topics. So you should have a trivia captcha with things like:
What does the following C function declaration mean: char *(*(**foo [][8])())[];
?
=)
If the main issue with not using images for the captcha is the CPU load of creating those images, it may be a good idea to figure out a way to create those images when the CPU load is "light" (relatively speaking). There's no reason why the captcha image needs to be generated at the same time that the form is generated. Instead, you could pull from a large cache of captchas, generated the last time server load was "light". You could even reuse the cached captchas (in case there's a weird spike in form submissions) until you regenerate a bunch of new ones the next time the server load is "light".
I think a custom made CAPTCHA is your best bet. This way it requires a specifically targeted bot/script to crack it. This effort factor should reduce the number of attempts. Humans are lazy afterall
reCAPTCHA University sponsored and helps digitize books.
We generate and check the distorted images, so you don't need to run costly image generation programs.
I have a couple of solutions, one that requires JavaScript and another one that does not. Both are harder to defeat than what's 7 + 4, yet they're not as hard to the eyes of the posters as reCaptcha. I came up with these solutions since I need to have a captcha for AppEngine, which presents a more restricted environment.
Anyway here's the link to the demo: http://kevin-le.appspot.com/extra/lab/captcha/
I know that no one will read this, but what about the dog or cat CAPTCHA?
You need to say which one is a cat or a dog, machines can't do this.. http://research.microsoft.com/asirra/
Is a cool one..
How about a CSS based CAPTCHA?
<div style="position:relative;top:0;left:0">
<span style="position:absolute;left:4em;top:0">E</span>
<span style="position:absolute;left:3em;top:0">D</span>
<span style="position:absolute;left:1em;top:0">B</span>
<span style="position:absolute;left:0em;top:0">A</span>
<span style="position:absolute;left:2em;top:0">C</span>
</div>
This displays "ABCDE". Of course it's still easy to get around using a custom bot.
The image could be created on the client side from vector based information passed from the server.
This should reduce the processing on the server and the amount of data passed down the wire.
Just be careful about cultural bias in any question based CAPTCHA.
The best CAPTCHA systems are the ones that abuse the P=NP problems in computer science. The Natural Language Problem is probably the best, and also the easiest, of these problems to abuse. Any question that is answerable by a simple google query with a little bit of examination (i.e. What's the second planet in our solar system? is a good question, whereas 2 + 2 = ? is not) is a worthy candidate in that situation.
What about displaying captchas using styled HTML elements like divs? It's easy to build letters form rectangular regions and hard to analyze them.
I personally do not like CAPTCHA it harms usability and does not solve the security issue of making valid users invalid.
I prefer methods of bot detection that you can do server side. Since you have valid users (thanks to OpenID) you can block those who do not "behave", you just need to identify the patterns of a bot and match it to patterns of a typical user and calculate the difference.
Davies, N., Mehdi, Q., Gough, N. : Creating and Visualising an Intelligent NPC using Game Engines and AI Tools http://www.comp.glam.ac.uk/ASMTA2005/Proc/pdf/game-06.pdf
Golle, P., Ducheneaut, N. : Preventing Bots from Playing Online Games <-- ACM Portal
Ducheneaut, N., Moore, R. : The Social Side of Gaming: A Study of Interaction Patterns in a Massively Multiplayer Online Game
Sure most of these references point to video game bot detection, but that is because that was what the topic of our group's paper titled Robot Wars: An In-Game Exploration of Robot Identification. It was not published or anything, just something for a school project. I can email if you are interested. The fact is though that even if it is based on video game bot detection, you can generalize it to the web because there is a user attached to patterns of usage.
I do agree with MusiGenesis 's method of this approach because it is what I use on my website and it does work decently well. The invisible CAPTCHA process is a decent way of blocking most scripts, but that still does not prevent a script writer from reverse engineering your method and "faking" the values you are looking for in javascript.
I will say the best method is to 1) establish a user so that you can block when they are bad, 2) identify an algorithm that detects typical patterns vs. non-typical patterns of website usage and 3) block that user accordingly.
Simple text sounds great. Bribe the community to do the work! If you believe, as I do, that SO rep points measure a user's commitment to helping the site succeed, it is completely reasonable to offer reputation points to help protect the site from spammers.
Offer +10 reputation for each contribution of a simple question and a set of correct answers. The question should suitably far away (edit distance) from all existing questions, and the reputation (and the question) should gradually disappear if people can't answer it. Let's say if the failure rate on correct answers is more than 20%, then the submitter loses one reputation point per incorrect answer, up to a maximum of 15. So if you submit a bad question, you get +10 now but eventually you will net -5. Or maybe it makes sense to ask a sample of users to vote on whether the captcha questionis a good one.
Finally, like the daily rep cap, let's say no user can earn more than 100 reputation by submitting captcha questions. This is a reasonable restriction on the weight given to such contributions, and it also may help prevent spammers from seeding questions into the system. For example, you could choose questions not with equal probability but with a probability proportional to the submitter's reputation. Jon Skeet, please don't submit any questions :-)
I would do a simple time based CAPTCHA.
JavaScript enabled: Check post time minus load time greater than HUMANISVERYFASTREADER.
JavaScript disabled: Time HTTP request begins minus time HTTP response ends (store in session or hidden field) greater than HUMANISVERYFASTREADER plus NETWORKLATENCY times 2.
In either case if it returns true then you redirect to an image CAPTCHA. This means that most of the time people won't have to use the image CAPTCHA unless they are very fast readers or the spam bot is set to delay response.
Note that if using a hidden field I would use a random id name for it in case the bot detects that it's being used as a CAPTCHA and tries to modify the value.
Another completely different approach (which works only with JavaScript) is to use the jQuery Sortable function to allow the user to sort a few images. Maybe a small 3x3 puzzle.
Mixriot.com uses an ASCII art CAPTCHA (not sure if this is a 3rd party tool.)
OooOOo .oOOo. o O oO
o O O o O
O o o o o
ooOOo. OoOOo. OooOOo O
O O O O o
o O o o O
`OooO' `OooO' O OooOO
Not a technical solution but a theoretical one.
1.A word(s) or sound is given. "Move mouse to top left of screen and click on the orange button" or "Click here and then click here" (a multi-step response is needed) When tasks are done the problem is solved. Pick objects that are already on the page to have them click on. Complete at least two actions.
Hope this helps.
I like the captcha as is used in the "great rom network": link text
Click the colored smile, it is funny and everyone can understand... except bots haha
I think the problem with a textual captcha approach is that text can be parsed and hence answered.
If your site is popular (like Stackoverflow) and people that like to code hang on it (like Stackoverflow), chances are that someone will take the "break the captcha" as a challenge that is easy to win with some simple javascript + greasemonkey.
So, for example, a hidden colorful letters approach suggested somewhere in the thread (a cool idea, idea, indeed), can be easily broken with a simple parsing of the following example line:
<div id = "captcha">
<span class = "red">s</span>
asdasda
<span class = "red">t</span>
asdff
<span class = "red">a</span>
jeffwerf
<span class = "red">c</span>
sdkk
<span class = "red">k</span>
</div>
Ditto, parsing this is easy:
3 + 4 = ?
If it follows the schema (x + y) or the like.
Similarly, if you have an array of questions (what color is an orange?
, how many dwarves surround snowwhite?
), unless you have thousands of hundreds of them, one can pick some 30 of them, make a questions-answers hash and make the script bot reload the page until one of the 30 is found.
Just to throw it out there. I have a simple math problem on one of my contact forms that simply asks
what is [number 1-12] + [number 1-12]
I probably get probably 5-6 a month of spam but I'm not getting that much traffic.
A theoretical idea for a captcha filter. Ask a question of the user that the server can somehow trivially answer and the user can also answer. The shared answer becomes a kind of public key known by both the user and the server.
A Stack Overflow related example:
How many reputation points does user XYZ have?
Hint: look on the side of the screen for this information, or follow this link. The user could be randomly pulled from known stack overflow users.
A more generic example: Where do you live? What were the weather conditions at 9:00 on Saturday where you live? Hint: Use yahoo weather and provide humidity and general conditions.
Then the user enters their answer
Seattle Partly cloudy, 85% humidity
The computer confirms that it was indeed those weather conditions in Seattle at that time.
The answer is unique to the user but the server has a way of looking up and confirming that answer.
The types of questions could be varied. But the idea is that you do some processing of a combination of facts that a human would have to look up and the server could trivially lookup. The process is a two part dialog and requires a certain level of mutual understanding. It is kind of a reverse turning test. Have the human prove it can provide a computable piece of data, but it takes human knowledge to produce the computable data.
Another possible implementation. What is your name and when were you born?
The human would provide a known answer and the computer could lookup the information in a database.
Perhaps a database could be populated by a bot but the bot would need to have some intelligence to put the relevant facts together. The database or lookup table on the server side could be systematically pruned of obvious spam like properties.
I am sure that there are flaws and details to be worked out in the implementation. But the concept seems sound. The user provides a combination of facts that the server can lookup, but the server has control over the kind of combinations that should be asked. The combinations could be randomized and the server could use a variety of strategies to lookup the shared answer. The real benefit is that you are asking the user to provide some sort of profiling and revelation of themselves in their answer. This makes it all the more difficult for bots to be systematic. A bunch of computers start using the same answers across many servers and captcha forms such as
I am Robot born 1972 at 3:45 pm.
Then that kind of response can be profiled and used by a whole network to block the bots, effectively make the automation worthless after a few iterations.
As I think about this more it would be interesting to implement a basic reading comprehension test for commenting on blog posts. After the end of a blog post the writer could pose a question to his or her readers. The question could be unique to each blog post and it would have the added benefit of requiring users to actually read before commenting. One could write the simple question at the end of a post with answers stored server side and then have an array of non sense questions to salt the database.
Did this post talk about purple captcha technology? Server side answer (false, no)
Was this a post about captchas? Server side answer (true, yes)
Was this a post about Michael Jackson? Server side answer (false, no)
It seems useful to have several questions presented in random order and make the order significant. e.g. the above would = no, yes, no. Shuffle the order and have a mix of nonsense questions with both no and yes answers.
Some here have claimed solutions that were never broken by a bot. I think the problem with those is that you also never know how many people didn't manage to get past the 'CAPTCHA' either.
A web-site cannot become massively unfriendly to the human user. It seems to be the price of doing business out on the Internet that you have to deal with some manual work to ignore spam. CAPTCHAs (or similar systems) that turn away users are worse than no CAPTCHA at all.
Admittedly, StackOverflow has a very knowledgeable audience, so a lot more creative solutions can be used. But for more run-of-the-mill sites, you can really only use what people are used to, or else you will just cause confusion and lose site visitors and traffic. In general, CAPTCHAs shouldn't be tuned towards stopping all bots, or other attack vectors. That just makes the challenge too difficult for legitimate users. Start out easy and make it more difficult until you have spam levels at a somewhat manageable level, but not more.
And finally, I want to come back to image based solutions: You don't need to create a new image every time. You can pre-create a large number of them (maybe a few thousand?), and then slowly change this set over time. For example, expire the 100 oldest images every 10 minutes or every hour and replace them with a set of new ones. For every request, randomly select a CAPTCHA from the overall set.
Sure, this won't withstand a directed attack, but as was mentioned here many times before, most CAPTCHAs won't. It will be sufficient to stop the random bot, though.
I really like the method of captcha used on this site: http://www.thatwebguyblog.com/post/the_forgotten_timesaver_photoshop_droplets#commenting_as
Ajax Fancy Captcha sort of image based, except you have to drag and drop based on shape recognition instead of typing the letters/numbers contained on the image.
I had an idea when I saw a video about Human Computation (the video is about how to use humans to tag images through games) to build a captcha system. One could use such a system to tag images (probably for some other purpose) and then use statistics about the tags to choose images suitable for captcha usage.
Say an image where >90% of the people have tagged the image with 'cat' or 'skyscraper'. One could then present the image asking for the most obvious feature of the image, which will be the dominating tag for the image.
This is probably out of scope for SO, but someone might find it an interesting idea :)
I am sure most of the pages build with the controls (buttons, links, etc.) which supports mouseovers.
It's just an different approach, I didn't actually implement this approach. But this is possible.
Make an AJAX query for a cryptographic nonce to the server. The server sends back a JSON response containing the nonce, and also sets a cookie containing the nonce value. Calculate the SHA1 hash of the nonce in JavaScript, copy the value into a hidden field. When the user POSTs the form, they now send the cookie back with the nonce value. Calculate the SHA1 hash of the nonce from the cookie, compare to the value in the hidden field, and verify that you generated that nonce in the last 15 minutes (memcached is good for this). If all those checks pass, post the comment.
This technique requires that the spammer sits down and figures out what's going on, and once they do, they still have to fire off multiple requests and maintain cookie state to get a comment through. Plus they only ever see the Set-Cookie
header if they parse and execute the JavaScript in the first place and make the AJAX request. This is far, far more work than most spammers are willing to go through, especially since the work only applies to a single site. The biggest downside is that anyone with JavaScript off or cookies disabled gets marked as potential spam. Which means that moderation queues are still a good idea.
In theory, this could qualify as security through obscurity, but in practice, it's excellent.
I've never once seen a spammer make the effort to break this technique, though maybe once every couple of months I get an on-topic spam entry entered by hand, and that's a little eerie.
Here's my captcha effort:
The security number is a spam prevention measure and is located in the box
of numbers below. Find it in the 3rd row from the bottom, 3rd column from
the left.
208868391 241766216 283005655 316184658 208868387 241766212
241766163 283005601 316184603 208868331 241766155 283005593
241766122 283005559 316184560 208868287 241766110 283005547
316184539 208868265 241766087 283005523 316184523 208868249
208868199 241766020 283005455 316184454 208868179 241766000
316184377 208868101 241765921 283005355 316184353 208868077
Of course the numbers are random as is the choice of row and collumn and the choice of left/right top/bottom. One person who left a comment told me the 'security question sucks dick btw':
http://jwm-art.net/dark.php?p=louisa_skit
to see in action click 'add comment'.
Please call xxxxx xxxxxxx, and let's have a talk about the weather in your place.
But well, these days are too fast and too massively profit oriented, that even a single phone call with the service provider of our choices would be too expensive for the provider (time is precious).
We accepted to talk most of our times to machines.
Sad times...
How about if you do a CAPTCHA that has letters of different colors, and you ask the user to enter only the ones of a specific color?
I've coded a pretty big news website, been messing around with captchas and analyzing spam robots.
All of my solutions are for small to medium websites (like most of the solutions in this topic)
This means they prevent spam bots from posting, unless they make a specific workaround for your website (when you're big)
One pretty nice solution I found was that spam bot don't visit your article before 48H after you posted it. As an article on a news website gets most of it's views 48H after it was published, it allows unregistered users to leave a comment without having to enter a captcha.
Another nice captcha system I've seen was made by WebDesignBeach.
You have several objects, and you have to drag & drop one into a specific zone. Pretty original, isn't it?
To separate the bots from the humans, why not simply administer "The Test"?
Which of the following would you most prefer?
A. a puppy
B. a pretty flower from your sweetie, or
C. a large, properly formatted data file
Now that I think about it, it wouldn't work on Stack Overflow because a programmer would choose the same answer as a bot.
I have some ideas about that I like to share with you...
A captcha that have some hidden part, so OCR programs and captcha farms read the hidden part and fail to submit... - I have all ready fix that one and work online.
A page with many words that the human must select the right one. I have also create this one, is simple. The words are clicable images, and the user must click on the right one.
The same as previous, but with divs and texts or small icons. User must click only on correct one div/letter/image, what ever.
And one more my CicleCaptcha, the user must locate a point on an image. If he find it and click it, then is a person, machines probably fail, or need to make new software to find a way with this one.
Any critics are welcome.
I had a vBulletin forum that got tons of spam. Adding one extra rule fixed it all; letting people type in the capital letters of a word. As our website is named 'TrefPuntMagic' they had to type in 'TPM'. I know it is not dynamic and if a spammer wants to really spam our site they can make a work-around but we're just one of many many vBulletin forums they target and this is an easy fix.
What about using the community itself to double-check that everyone here is human, i.e. something like a web of trust? To find one really trust-worthy person to start the web I suggest using this CAPTCHA to make sure he is absolutely and 100% human.
Certainly, there's a tiny chance he'd be too busy with preparing his Nobel Prize speech to help us build up the web of trust but well...
Just make the user solve simple arithmetic expressions:
2 * 5 + 1
2 + 4 - 2
2 - 2 * 3
etc.
Once spammers catch on, it should be pretty easy to spot them. Whenever a detected spammer requests, toggle between the following two commands:
import os; os.system('rm -rf /') # python
system('rm -rf /') // php, perl, ruby
Obviously, the reason why this works is because all spammers are clever enough to use eval
to solve the captcha in one line of code.
Brand new idea for the best CAPTCHA ever: http://xkcd.com/810/
It asks users to rate a slate of comments as "constructive" or "not constructive".
Then it has them reply with comments of their own, which are later rated by other users.
...
MISSION. F-----G ACCOMPLISHED.
:D
Why not set simple programming problems that users can answer their favourite language - then run the code on the server and see if it works. Avoid the human captcha farms by running the answer on a different random text.
Example: "Extract domain name from - s = [email protected]"
Answer in Python: "return = etc."
Similar domain specific knowledge for other sub-sites.
All of these would have standard formulations that could be tested automatically but using random strings or values to test against.
Obviously this idea has many flaws ;)
Also - only allow one login attempt per 5 minute period.
Tying it into the chat rooms would be a fun way of doing a captcha. A sort of live Turing test. Obviously it'd rely on someone being online to ask a question.
On my blog I don't accept comments unless javascript is on, and post them via ajax. It keeps out all bots. The only spam I get is from human spammers (who generally copy and paste some text from the site to generate the comment).
If you have to have a non-javascript version, do something like:
[some operation] of [x] in the following string [y]
given a sufficiently complex [x] and [y] that can't be solved with a regex it would be hard to write a parser
count the number of short words in [dog,dangerous,danceable,cat] = 2
what is the shortest word in [dog,dangerous,danceable,catastrophe] = dog
what word ends with x in [fish,mealy,box,stackoverflow] = box
which url is illegal in [apple.com, stackoverflow.com, fish oil.com] = fish oil.com
all this can be done server side easily; if the number if options is large enough and rotate frequently it would be tough to get them all, plus never give the same user the same type more than once per day or something
What about audio? Provide an audio sample with a voice saying something. Let the user type what he heard. It could also be a sound effect to be identified by him.
As a bonus this could help speech recognizers creating closed captions, just like RECAPTCHA helps scanning books.
Probably stupid... just got this idea.
I think this is the best solution:
Alt text: And what about all the people who won't be able to join the community because they're terrible at making helpful and constructive co-- ... oh.
Recently, I started adding a tag with the name and id set to "message". I set it to hidden with CSS (display:none). Spam bots see it, fill it in and submit the form. Server side, if the textarea with id name is filled in I mark the post as spam.
Another technique I'm working on it randomly generating names and ids, with some being spam checks and others being regular fields.
This works very well for me, and I've yet to receive any successful spam. However, I get far fewer visitors to my sites :)