ansaurus

Question

Answer 1

+1 A:

You say that it is okay that some users appear as bots, therefore,

Most bots don't run javascript. Use javascript to do an Ajax like call to the server that identifies this IP address as NonBot. Store that for a set period of time to identify future connections from this IP as good clients and to prevent further wasteful javascript calls.

Rob Prouse 2008-12-04 19:08:18

Answer 2

+1 A:

A simple test is javascript:

<script type="text/javascript">
document.write('<img src="/not-a-bot.' + 'php" style="display: none;">');
</script>

The not-a-bot.php can add something into the session to flag that the user is not a bot, then return a single pixel gif.

The URL is broken up to disguise it from the bot.

Greg 2008-12-04 19:09:08

Only difficulty is that lots of users now turn javascript off, given security concerns. It's almost humorous that with it would be one of the easiest ways to test for authenticity.

The Wicked Flea 2008-12-04 19:11:44

Really? With javascript off there's a ton of sites that just don't work nowadays. I thought more users are running with javascript ON as time progressed.

Zachary Yates 2008-12-04 19:19:07

When using Firefox I have noscript active most times. So going to a site with a setup like this would flag me as a bot from the get go.

Dalin Seivewright 2008-12-04 19:58:42

@zachary, the 'problem' is that now more and more good web developers are using progressive enhancement to at least give a half decent experience so NOSCRIPT is apparently (although i've never tried it) a workable solution. i wish people weren't so paranoid. it makes so many otherwise easy things just frustratingly hard

Simon_Weaver 2009-06-06 23:23:34

People are paranoid for good reason. There are a lot of security vulnerabilities these days that start or propagate through javascript (XSRF being a huge one right now). If more web developers were progressive in their client-server interactions, the paranoia would be less likely (but still justified).

patridge 2009-06-26 18:48:34

Answer 3

+5 A:

Clarify why you want to exclude bots, and how tolerant you are of mis-classification.

That is, do you have to exclude every single bot at the expense of treating real users like bots? Or is it okay if bots crawl your site as long as they don't have a performance impact?

The only way to exclude all bots is to shut down your web site. A malicious user can distribute their bot to enough machines that you would not be able to distinguish their traffic from real users. Tricks like JavaScript and CSS will not stop a determined attacker.

If a "happy medium" is satisfactory, one trick that might be helpful is to hide links with CSS so that they are not visible to users in a browser, but are still in the HTML. Any agent that follows one of these "poison" links is a bot.

erickson 2008-12-04 19:10:08

If the user had some sort of Web Accelerator installed, then it still might visit the invisible links, if the web accelerator wasn't extremely smart.

Kibbee 2008-12-04 19:16:07

Answer 4

+2 A:

User agents can be faked. Captchas have been cracked. Valid cookies can be sent back to your server with page requests. Legitimate programs, such as Adobe Acrobat Pro can go in and download your web site in one session. Users can disable JavaScript. Since there is no standard measure of "normal" user behaviour, it cannot be differentiated from a bot.

In other words: it can't be done short of pulling the user into some form of interactive chat and hope they pass the Turing Test, then again, they could be a really good bot too.

Diodeus 2008-12-04 19:13:53

Answer 5

A:

Well, this is really for a particular page of the site. We don't want a bot submitting the form b/c it messes up tracking. Honestly, the friendly bots, Google, Yahoo, etc aren't a problem as they don't typically fill out the form to begin with. If we suspected someone of being a bot, we might show them a captcha image or something like that... If they passed, they're not a bot and the form submits...

I've heard things like putting a form in flash, or making the submit javascript, but I'd prefer not to prevent real users from using the site until I suspected they were a bot...

2008-12-04 19:14:01

Answer 6

A:

I think your idea with checking the session id will already be quite useful.

Another idea: You could check whether embedded resources are downloaded as well.

A bot which does not load images (e.g. to save time and bandwidth) should be distinguishable from a browser which typically will load images embedded into a page.

Such a check however might not be suited as a real-time check because you would have to analyze some sort of server log which might be time consuming.

0xA3 2008-12-04 19:15:06

IE and Firefox at least have the ability to not download images.

Dalin Seivewright 2008-12-04 19:57:02

Safari also has the option to disable images.

epochwolf 2008-12-04 20:25:59

Lynx. Don't forget Lynx. Which nobody uses. But which *can* submit forms. Yeah...

Brian 2008-12-04 20:51:55

Yes, there is no perfect way. But I guess with a combination of several methods such as checking for scripting, image downloads, CSS tricks etc you could make it much harder for an evil bot...

0xA3 2008-12-05 10:25:09

Answer 7

A:

For each session on the server you can determine if the user was at any point clicking or typing too fast. After a given number of repeats, set the "isRobot" flag to true and conserve resources within that session. Normally you don't tell the user that he's been robot-detected, since he'd just start a new session in that case.

krosenvold 2008-12-04 20:34:16

This wouldn't be foolproof, since many legitimate software solutions exist to automatically fill out web forms on a user's behalf.

sep332 2008-12-04 20:39:15

Well nothing's foolproof, but then again you just give a slightly lower QOS to that session. We'd only do this after a few pages of inhumanly fast behaviour

krosenvold 2008-12-23 08:54:07

Answer 8

A:

Hey, thanks for all the responses. I think that a combination of a few suggestions will work well. Mainly, the hidden form element that times how fast the form was filled out, and possibly the "poison link" idea. I think that it will cover most basis. When you're talking about bots, you're not going to find them all, so there's no point thinking that you will... Silly bots.

2008-12-05 18:02:31

Well, "silly bots" except for Google - without which many sites wouldn't get any traffic at all :)

CraigD 2009-06-07 00:53:37

Answer 9

A:

this seems to be a really complicated problem.

Geshan 2009-10-16 08:11:37

Answer 10

A:

There is an API available at www.atlbl.com that can identify webcrawlers (both good and bad) aswell as other forms of automated webbots.

2010-05-25 08:38:43

Answer 11

+1 A:

Here's an idea:

Most bots don't download css, javascript and images. They just parse the html.

If you could keep track in a user's session whether or not they download all of the above, e.g. by routing all of the download requests through a script that logs the attempts, then you could quickly identify users that only download the raw html (very few normal users will do this).

Finbarr 2010-05-25 08:51:08

Answer 12

A:

Make mouse event? Bot's haven't mouse.

r4ge 2010-05-25 14:22:43

ansaurus

tags:

views:

answers:

Programmatic Bot Detection

related questions