




I am building stats for my users and dont wish the visits from bots to be counted.

Now I have a basic php with mysql increasing 1 each time the page is called.

But bots are also added to the count.

Does anyone can think of a way?

Mainly is just the major ones that mess things up. Google, Yahoo, Msn, etc.


Have you tried identifying them by their user-agent information? A simple google search should give you the user-agents used by Google etc.

This, of course, is not foolproof, but most crawlers by major companies supply a distinct user-agent.

EDIT: Assuming you do not want to restrict the bots access, but just not count its visit in your statistc.

Simon Jensen
+6  A: 

You should filter by user-agent strings. You can find a list of about 300 common user-agents given by bots here: http://www.robotstxt.org/db.html Running through that list and ignoring bot user-agents before you run your SQL statement should solve your problem for all practical purposes.

If you don't want the search engines to even reach the page, use a basic robots.txt file to block them.

+1  A: 

Check the user agent before incrementing the page view count, but remember that this can be spoofed. PHP exposes the user agent in $_SERVER['HTTP_USER_AGENT'], assuming that the web server provides it with this information. More information about $_SERVER can be found at http://www.php.net/manual/en/reserved.variables.server.php.

You can find a list of user agents at http://www.user-agents.org; Googling will also provide the names of those belonging to the major providers. A third possible source would be your web server's access logs, if you can aggregate them.


I want them to go to the page...

I want them to go every hour if possible :) they are indexing stuff...

I will look into the user-agents.

Hugo Gameiro
You should edit your original question instead of posting another answer.

You can check the User Agent string, empty strings, or strings containing 'robot', 'spider', 'crawler', 'curl' are likely to be robots.

preg_match('/robot|spider|crawler|curl|^$/i', $_SERVER['HTTP_USER_AGENT']));


that preg_match looks good... thanks

Hugo Gameiro