How to detect if a visitor is human and not a spider

views:

722

answers:

+4 Q:

How to detect if a visitor is human and not a spider

I am logging every visit to my website and determining if the visitor is human is important. I have searched the web and found many interesting ideas on how to detect if the visitor is human.

if the visitor is logged in and passed captcha
detecting mouse events
Detecting if the user has a browser [user agent]
detecting mouse clicks [how would i go about this?]

Are there any other surefire ways to detect if the visitor is human?

Make the user awnser a question like "What is 3 + 5?"

Tim Santeford 2009-10-01 01:53:18

Better ask: who are you? who created you? ... lol

eglasius 2009-10-01 01:54:21

I disagree with this as it would obviously disrupt the user experience. it's pain enough just filling out a captcha. however, if this doesn't matter to you then this would definitely be a solution.

pixelbobby 2009-10-01 01:56:30

He asked if there were any other ways didn't he? lol

Tim Santeford 2009-10-01 02:00:16

Most modern spam bots can easily bet the adding question. Try it yourself. I even had a spam-bot fool a double math question like 2 + 4 / 3 and it just breezed trough with no problem at all.Got me thinking that the parser must read the sentence and apply the match as it is, dunno if I could take the spammer down by trying something like 2+2^12398123819238123.

Frankie 2009-10-01 03:10:30

I would not use that particular question exactly hense the word "Like". I am fully aware that bots can do math. I was being simple for example sake. You could be more challenging by rotating qustions in like: "Which is bigger an apple or a flea?" What letter comes before B?

Tim Santeford 2009-10-01 03:30:37

how about asking ASL !!

Rakesh Juyal 2009-10-01 05:37:50

+4 A:

The most reliable way to detect spiders is by IP address. Common spiders use several commonly known IP addresses. http://www.iplists.com/nw/

2009-10-01 01:53:27

Remember, that whatever you do you are making it harder for an automated process to do it, doesn't mean you are completely preventing it.

Regarding mouse events, those are things that happen on the client side, so you would only be adding info to the request.

eglasius 2009-10-01 01:53:40

+2 A:

You should check the user-agent property. You can likely accomplish this in C#.

For example HttpContext.Current.Request... and then ask for the user-agent. This might give you something like crawler.google or what have you so you may have to build your own list to check against and return the result.

pixelbobby 2009-10-01 01:55:21

Malicious or misbehaving bots will spoof internet explorer or firefox making this method unrelyable.

Tim Santeford 2009-10-01 01:57:56

well coal in the stocking for them!

pixelbobby 2009-10-01 02:00:20

@Tim then you'd be after this question: http://stackoverflow.com/questions/233192/detecting-stealth-web-crawlers

Stephen Denne 2009-10-01 02:00:21

+2 A:

count the legs?

sorry, couldn't resist :-)

Charles Bretana 2009-10-01 01:56:47

+6 A:

You need to distingish between well beheaved law abiding robots, and, nasty data thieving piratical robots.

Nice robots will read the 'Robots' meta tag and comply with you policy. 'no index' being a polite way to refuse any of thier services.

Malicious robots on the other hand are going to fake any "UserAgent" and similar headers.

Captchas are probably the best method but they can P*ss off non robots if over used.

One sneaky method I have seen is to have a recursive link as the first link on the page which will send the crawler into a loop. Another is to have a link to a site you dislike as the first link on the page to distract the robots attention. Both these links can easily be rendered "invisable" to meat based agents.

James Anderson 2009-10-01 02:07:53

+1 for meat based agents.

Matt Grande 2009-10-01 12:37:51

With forms you can use javascript alter the form action to point to a real url. That will filter out any bot that does not render pages with javascript. You can have multiple submit buttons where only one of them really works and then you hide all the rest with css. The bots will not know which to click first. If you ever recieve a click from one of the bogus buttons then you know you have bot.

Tim Santeford 2009-10-01 02:10:46

Either use Captcha or use Javascript to validate. A huge percentage of bots do not evaluate Javascript.

Unknown 2009-10-01 02:11:05

+2 A:

If you're mainly concerned on form validation... I would suggest Akismet - the wordpress free service to catch spam. It works very well.

If you're trying to save the server some bandwidth... the question is completely different and I would probably go another way like preventing hot-linking.

That said, no solution is perfect but you should try to stick with the one that provides you with a minimum level of comfort and your users with a maximum. Its all about the users.

Frankie 2009-10-01 03:15:55

+1 A:

If you are going down the Captcha route you could always use invisable Captcha.

Basicly create a input control with a label saying what is 5+2 and then using javascript solve this and enter the value in your text box then hide the text field. Almost all spiders cant run Javascript, any normal user they wont even know that is happening, and any user with out Javascript just sees the field to fill in.

Google analytics also works on JS so you could just use that?

TheAlbear 2009-10-01 12:33:24

+3 A:

knittl 2009-10-01 12:38:56

ansaurus

tags:

views:

answers:

How to detect if a visitor is human and not a spider

related questions