views:

508

answers:

3

My reading of this article suggests that a benefit of ReCAPTCHA is that it can have humans verify words not recognised in the OCR/digitization of books. It does this by using these words in "Are you human?" tests. So ReCAPTCHA kills two birds with one stone. Great!

But I dont get it. If the word can't be recognised by the digitization process then what is the input entered, by the supposed human being, verified against? How does this work?

+11  A: 

It shows two words. One of them the computer already knows, the other, it doesn't. It assumes that if you get the known one right, that you must know the other.

You don't know which of the two is already known so you, theoretically can't trick it. Additionally, it will replay a word with multiple people to get independent confirmation before sending it back to the source (newspaper company, book scanning group) as a valid answer.

But if a computer can't read such a CAPTCHA, how does the system know the correct answer to the puzzle? Here's how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.

http://recaptcha.net/learnmore.html

Michael La Voie
I would note that ReCAPTCHA works by showing two words, one which it knows and another it doesn't. You just need to get one correct. However, they resubmit the one that they don't several times until they get a high percentage that it is correct before they absorb it as such.
Nissan Fan
Good point, the system isn't fool proof, though it is clever. It reminds me of Google's Image Labeler game which works by the same principles to get unknown people to label images: http://images.google.com/imagelabeler/
Michael La Voie
Imagelabeler and ReCapcha are by the same guy: Luis von Ahn. He really led the way on using humans to solve hard problems.
Michael Donohue
Ok great and thanks for the link. So the image kinda picks up a "reputation" for being a given word as "voted" on by numerous people in the human verification process. Just out of interest, any idea how many "votes" an images must get before it is accepted or retired?
intermension
Unfortunately no, but the work "reputation" is a good one. It fits well with the "knowledge of crowds" concept.
Michael La Voie
+1  A: 

Quoted from LEARN HOW reCAPTCHA WORKS

But if a computer can't read such a CAPTCHA, how does the system know the correct answer to the puzzle? Here's how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct.

Pascal Thivent
A: 

All what you need to know is right here...

How reCAPTCHA works and how to mess with it!

Anonymous