tags:

views:

1233

answers:

6

I am building stats for my users and dont wish the visits from bots to be counted.

Now I have a basic php with mysql increasing 1 each time the page is called.

But bots are also added to the count.

Does anyone can think of a way?

Mainly is just the major ones that mess things up. Google, Yahoo, Msn, etc.

A: 

Have you tried identifying them by their user-agent information? A simple google search should give you the user-agents used by Google etc.

This, of course, is not foolproof, but most crawlers by major companies supply a distinct user-agent.

EDIT: Assuming you do not want to restrict the bots access, but just not count its visit in your statistc.

Simon Jensen
+6  A: 

You should filter by user-agent strings. You can find a list of about 300 common user-agents given by bots here: http://www.robotstxt.org/db.html Running through that list and ignoring bot user-agents before you run your SQL statement should solve your problem for all practical purposes.

If you don't want the search engines to even reach the page, use a basic robots.txt file to block them.

amdfan
+1  A: 

Check the user agent before incrementing the page view count, but remember that this can be spoofed. PHP exposes the user agent in $_SERVER['HTTP_USER_AGENT'], assuming that the web server provides it with this information. More information about $_SERVER can be found at http://www.php.net/manual/en/reserved.variables.server.php.

You can find a list of user agents at http://www.user-agents.org; Googling will also provide the names of those belonging to the major providers. A third possible source would be your web server's access logs, if you can aggregate them.

Rob
A: 

I want them to go to the page...

I want them to go every hour if possible :) they are indexing stuff...

I will look into the user-agents.

Hugo Gameiro
You should edit your original question instead of posting another answer.
amdfan
A: 

You can check the User Agent string, empty strings, or strings containing 'robot', 'spider', 'crawler', 'curl' are likely to be robots.

preg_match('/robot|spider|crawler|curl|^$/i', $_SERVER['HTTP_USER_AGENT']));

Rob
A: 

that preg_match looks good... thanks

Hugo Gameiro