Hello, I am trying to monitor genuine page hits. Here is what my site does. I have an article directory where people can post articles. When their article is posted they are paid depending on the amount of unique users visit their pages. So page hits are important. Here is the problem I am facing.
What I need:
- I don't want to track page hits by minor search engines or robots.
- I would like the major 4 search engines to surf my site because I can monitor them by IP address and not count their visit as a page hit. This cannot be done for spam bots because they do a good job of passing as a real human or major search engine.
Problems:
- There are spam bots on the internet that do not honor the robot.txt file
- There are bots that try to fake being a real human user. By manipulating the user agent and other things in the header.
- Performance may suffer by always checking the database for good IP addresses
- A human being can bypass the captha only to allow their robot to view my pages
Possible solutions:
- Require a captcha on every page. If the captcha passes. then log the IP address as good or submit a cookie on the users machine indicating they passed.
- Allow all major search engines IP address, so they will not be presented with a captcha
- Purchase a bot detection software
- Require the viewer to pass a captca every 7 days
Getting accurate human page views is critical for this site to work properly. Do you guys have any other ideas