tags:

views:

73

answers:

4

I want to, using PHP, differentiate between an actual person and a bot. I currently track page views and they are massively inflated due to bots crawling my pages so I want to only record real people. It doesn't matter if its not 100% accurate I just want a nice simple way to do it via PHP.

To be clear, this is not for analytics's per se; it is so that I can track what images are being served daily so I can produce a "top images of the day" sort of script.

+3  A: 

You should be checking the user agent string, most well behaved search bots will report themselves as such.

Google's spider for example.

meagar
Is there a way to check that someone is human (i.e. web browser) as opposed to discounting bots?
Chris
As an aside, if you're looking for the "most popular image" and you're site is being indexed, you can probably assume the spider accessed all images an equal number of times, it shouldn't influence the popularity.
meagar
@Chris Not really, no. Short of forcing all your users to answer a captcha for every single page, there is no way to verify they're human - and even that wouldn't be 100% effective.
meagar
A: 

I'm not sure that PHP is the best solution for this kind of problem.
You can read How to block bad bots and How to block spambots, ban spybots, and tell unwanted robots to go to hell to see more solutions about blocking bots but this time with apache.

Apache will act faster a require less CPU to do this sort of task than a php program.

Colin Hebert
Blocking is very different from differentiating, especially if you actually want your site listed on Google.
meagar
A: 

This is one of the reasons I use Google Analytics to keep track of my page views.

joebert
A: 

First, the obvious: check the user agent.

I use another trick that works pretty good. I map robots.txt to a PHP file and log the IP into the database. Then when logging user activity, I make sure they aren't from one of those logged IPs. If the user authenticates via the login system then I track them regardless.

Of course neither solution guarantees any accuracy, but for general logging, it has been sufficient for my purposes.

konforce