views:

722

answers:

11

I am logging every visit to my website and determining if the visitor is human is important. I have searched the web and found many interesting ideas on how to detect if the visitor is human.

  1. if the visitor is logged in and passed captcha
  2. detecting mouse events
  3. Detecting if the user has a browser [user agent]
  4. detecting mouse clicks [how would i go about this?]

Are there any other surefire ways to detect if the visitor is human?

A: 

Make the user awnser a question like "What is 3 + 5?"

Tim Santeford
Better ask: who are you? who created you? ... lol
eglasius
I disagree with this as it would obviously disrupt the user experience. it's pain enough just filling out a captcha. however, if this doesn't matter to you then this would definitely be a solution.
pixelbobby
He asked if there were any other ways didn't he? lol
Tim Santeford
Most modern spam bots can easily bet the adding question. Try it yourself. I even had a spam-bot fool a double math question like 2 + 4 / 3 and it just breezed trough with no problem at all.Got me thinking that the parser must read the sentence and apply the match as it is, dunno if I could take the spammer down by trying something like 2+2^12398123819238123.
Frankie
I would not use that particular question exactly hense the word "Like". I am fully aware that bots can do math. I was being simple for example sake. You could be more challenging by rotating qustions in like: "Which is bigger an apple or a flea?" What letter comes before B?
Tim Santeford
how about asking ASL !!
Rakesh Juyal
+4  A: 

The most reliable way to detect spiders is by IP address. Common spiders use several commonly known IP addresses. http://www.iplists.com/nw/

A: 

Remember, that whatever you do you are making it harder for an automated process to do it, doesn't mean you are completely preventing it.

Regarding mouse events, those are things that happen on the client side, so you would only be adding info to the request.

eglasius
+2  A: 

You should check the user-agent property. You can likely accomplish this in C#.

For example HttpContext.Current.Request... and then ask for the user-agent. This might give you something like crawler.google or what have you so you may have to build your own list to check against and return the result.

pixelbobby
Malicious or misbehaving bots will spoof internet explorer or firefox making this method unrelyable.
Tim Santeford
well coal in the stocking for them!
pixelbobby
@Tim then you'd be after this question: http://stackoverflow.com/questions/233192/detecting-stealth-web-crawlers
Stephen Denne
+2  A: 

count the legs?

sorry, couldn't resist :-)

Charles Bretana
+6  A: 

You need to distingish between well beheaved law abiding robots, and, nasty data thieving piratical robots.

Nice robots will read the 'Robots' meta tag and comply with you policy. 'no index' being a polite way to refuse any of thier services.

Malicious robots on the other hand are going to fake any "UserAgent" and similar headers.

Captchas are probably the best method but they can P*ss off non robots if over used.

One sneaky method I have seen is to have a recursive link as the first link on the page which will send the crawler into a loop. Another is to have a link to a site you dislike as the first link on the page to distract the robots attention. Both these links can easily be rendered "invisable" to meat based agents.

James Anderson
+1 for meat based agents.
Matt Grande
A: 

With forms you can use javascript alter the form action to point to a real url. That will filter out any bot that does not render pages with javascript. You can have multiple submit buttons where only one of them really works and then you hide all the rest with css. The bots will not know which to click first. If you ever recieve a click from one of the bogus buttons then you know you have bot.

Tim Santeford
A: 

Either use Captcha or use Javascript to validate. A huge percentage of bots do not evaluate Javascript.

Unknown
+2  A: 

If you're mainly concerned on form validation... I would suggest Akismet - the wordpress free service to catch spam. It works very well.

If you're trying to save the server some bandwidth... the question is completely different and I would probably go another way like preventing hot-linking.

That said, no solution is perfect but you should try to stick with the one that provides you with a minimum level of comfort and your users with a maximum. Its all about the users.

Frankie
+1  A: 

If you are going down the Captcha route you could always use invisable Captcha.

Basicly create a input control with a label saying what is 5+2 and then using javascript solve this and enter the value in your text box then hide the text field. Almost all spiders cant run Javascript, any normal user they wont even know that is happening, and any user with out Javascript just sees the field to fill in.

Google analytics also works on JS so you could just use that?

TheAlbear
+3  A: 
knittl