views:

195

answers:

3

I need to write a captcha service for the integration software I am working on. After some thinking I think I don’t fully understand how captcha works tehnologically (I do understand how it works functionally) and therefore haven’t been able to make some design decisions. A few things that bother me are:

  1. Should I keep a session for each user? (i.e. remember their IP, domain, etc)
  2. Should I regenerate a passphrase on fail? (I know that sites like google and digg do it)
  3. Every call will hit database I am not sure if this will impact performance on the server but I will consider using things like memcahed. But I can't think of anyway to not hit db or cache becuase you need to first read, then validate then update.
  4. Do I need an expiry time for the captcha? say 15 mins?

If 1 is yes then I think the logic becomes complex because I need to do things like: has this passphrase been validated before? has it expired? is it from the same ip? etc

And if I need to remmeber the IP and validate against, after too many invalid request what do I do? Do I block them?

So I am thinking captcha should work this way, the simple way:

Sort of stateless which means each captcha generated will only survive 2 requests, the initial request and the subsequence request. And the result will either be failed or passed. If failed then create a new one.

I appretiate someone who can make some suggestions or explain how a proper captcha works. Thanks.

Update:

I need to explain the functional requirement a bit:

Terms:

  • customer is someone else out in the www
  • my service includes: captcha service and other service which customer can access via http request.

Workflow:

  1. customer makes request to captcha service
  2. captcha service generates token, passphrase and save to db
  3. customer make http request to captcha web to retrieve image
  4. customer makes request to our other service and pass in passphrase
  5. our other service will use passphrase to validate against our captcha service etc...

Also I am thinking if 3 is necessary. Or should I just renturn the image stream in step 2.

+1  A: 

1: Should I keep a session for each user? (i.e. remember their IP, domain, etc)

Depends on the server side web programming language you're using. Most them just offers builtin ways to manage the session, in for example PHP use session_start() and access $_SESSION and in for example JSP/Servlet you can get it by HttpServletRequest#getSession(). As you didn't mention which one you're using, I can't give a more specific/detailed answer. All I can suggest is to just consult the docs/tuts/books of the programming language in question.

You don't need to remember the IP. Just setting a key/token in the session is enough --which in turn is usually already backed by a cookie, so you could in theory also just use a cookie for this if you intend to homegrow this all (note: do NOT put the answer in the cookie, but just some unique key to identify the client!).

2: Should I regenerate a passphrase on fail? (I know that sites like google and digg do it)

Certainly you should. Otherwise it's easy for bots to do a brute force on the captcha.

That said, is there any reason that you don't use an existing captcha API which you could just plug in, such as reCAPTCHA?

BalusC
@BalusC: I was going to suggest reCAPTCHA too seen that the OP seems confused about how to implement CAPTCHAs... But then upon re-reading it's (edited) question apparently he wants to provide a service like reCAPTCHA to his customers himself :(
Webinator
@WizardOfOdds, in a sense you are correct but it's more for the B2B and SOA so it doesn't not need to be as complicated as recaptcha. BTW I've been using recaptcha. But I am not sure if I can bend recaptcha do my will (needs).
Jeffrey C
A: 

Captchas are hard. There is a lot of research going into designing them as well as breaking them. A much better, and more useful, solution, is to use a tool like reCAPTCHA: http://recaptcha.net/whyrecaptcha.html which provides pretty good security, makes integration easy and makes the time spent typing go to something useful.

Thomas Ahle
I've updated my question which sort of explains why I can't use recaptcha (the bottom part of the question). Maybe I can but I don't know how to modify recaptcha to suit my need.
Jeffrey C
A: 

I should start with why use captcha? Since the obvious answer is to prevent spam bots, I think saving sessions per user is a bad idea. It can lead to blocking legitimate users by mistake. Plus it will not prevent a smart bot from continuing to try break your captcha.

Furthermore, if you will not regenerate the captchas, a good bot will eventually break it

So, IMHO, you better of never blocking users, (maybe blocking registered users, but if it is anonymous then no) and regenerate the captcha for every request.

Am