tags:

views:

339

answers:

6

Hi All,

I have a web form which the users fill and the info send to server and stored on a database. I am worried that Robots might just fill in the form and I will end up with a database full of useless records. How can I prevent Robots from filling in my forms? I am thinking maybe something like Stackoverflow's robot detection, where if it thinks you are a robot, it asks you to verify that you are not. Is there a server-side API in Perl, Java or PHP?

+5  A: 

You can use reCAPTCHA (same as stackoverflow) - they have libraries for a number of programming languages.

bdonlan
+2  A: 

CAPTCHA is great. The other thing you can do that will prevent 99% of your robot traffic yet not annoy your users is to validate fields.

My site, I check for text in fields like zip code and phone number. That has removed all of the non-targeted robot misinformation.

IPX Ares
+4  A: 

I've always preferred Honeypot captcha (article by phil haack), as its less invasive to the user.

Matthew Vines
+8  A: 

There are several solutions.

  1. Use a CAPTCHA. SO uses reCAPTCHA as far as I know.

  2. Add an extra field to your form and hide it with CSS (display:none). A normal user would not see this field and therefore will not fill it. You check at the submission if this field is empty. If not, then you are dealing with a robot that has carefully filled out all form fields. This technique is usually referred to as a "honeypot".

  3. Add a JavaScript timer function. At the page load it starts a value at zero and then increases it as time passes. A normal user would read and fill out your form for some time and only then submit it. A robot would just fill out and submit the form immediately upon receiving it. You check if the value has gone much from zero at the submission. If it has, then it is likely a real user. If you see just a couple of seconds (or even no value at all due to the robots not executing JavaScript) then it is likely a robot. This will however only work if you decide you will require your users have JavaScript on in order to perform "write" operations.

There are other techniques for sure. But these are quite simple and effective.

Developer Art
A potential problem with #3 is that most robots will not execute any Javascript on the page, therefore there would be no value from the timer. I have had great success with 1 and 2, though.
friedo
@riedo: You are correct, I actually meant exactly this situation. No value from the timer will mean it's either robot or the user has JavaScript off. If the author decides his site will require the users to have JavaScript on in order to perform "write" operations, this approach could very well work.
Developer Art
+2  A: 

Captchas bring accessibility problems and will be ultimately defeated by software recognition.

I recommand the reading of this short article about bot traps, which include hidden fields, as Matthew Vines and New in town already suggested.

Anyway, you are still free to use both captcha and bot traps.

Altherac
A: 

You could create a two-step system in which a user fills the form, but then must reply to an e-mail to "activate" the record within a set period of time - say 24 hours.

In the back end, instead of populating your current table with all the form submissions, you could put them into a temporary table that automatically deletes any row that is older than your time allotment. Unless you have a serious bot problem, then I would think that the table wouldn't get that big, especially if the first form is just a few fields.

A benifit of this approach is that you don't have to use captcha or some other technology like that that might create some accessibility problems.

Robert DeBoer
E-mail activation has been one of the first solutions to bot registration but it's not very effective anymore. The bad news is : bots can read emails. If there is a strong incentive on writing a bot for your web site, it won't be long before the bot adapts and follows the right link in your activation e-mail.
Altherac
Wow, did not know that. So even though an e-mail gets sent to an e-mail box, the bot can read e-mail from that box and find and follow the link in the email (even plain text email)?
Robert DeBoer