views:

462

answers:

2

Hello,

I'm making a hit counter. I have a database and I store the IP and $_SERVER['HTTP_USER_AGENT']; of the visitors. Now I need to add a filter, so I can put away the hits, that are made by bots. I found out, that many bots usually keep some common words in the $_SERVER['HTTP_USER_AGENT']; , so I's like to make and array of words, that would keep the bot from displaying in the results.

Here is what I have now:

while($row = mysql_fetch_array($yesterday, MYSQL_ASSOC)) {

<-- Here I need a code, that would run through an array and check, if it containts the keywords and if it doesn't ... just count++; -->

 }

Also if you know any other way of detecting and removing the bots from the results, I'd be verry thankful. Cheers

A: 

there are certain systems that try and support semi-current DB of known bot strings, such as CubeCart and oscommerce before. they do that in order to have a boolean function that filters a user from bot in real time through string comparison of the user agent string against a file called spiders.txt. after discovering a bot, they disable shopping basket and login functionality etc.

here is the latest spiders.txt contents:

abacho abcdatos abcsearch acoon adsarobot aesop ah-ha alkalinebot almaden altavista antibot anzwerscrawl aol search appie arachnoidea araneo architext ariadne arianna ask jeeves aspseek asterias astraspider atomz augurfind backrub baiduspider bannana_bot bbot bdcindexer blindekuh boitho boito borg-bot bsdseek christcrawler computer_and_automation_research_institute_crawler coolbot cosmos crawler crawler@fast crawlerboy cruiser cusco cyveillance deepindex denmex dittospyder docomo dogpile dtsearch elfinbot entire web esismartspider exalead excite ezresult fast fast-webcrawler fdse felix fido findwhat finnish firefly firstgov fluffy freecrawl frooglebot galaxy gaisbot geckobot gencrawler geobot gigabot girafa goclick goliat googlebot griffon gromit grub-client gulliver gulper henrythemiragorobot hometown hotbot htdig hubater ia_archiver ibm_planetwide iitrovatore-setaccio incywincy incrawler indy infonavirobot infoseek ingrid inspectorwww intelliseek internetseer ip3000.com-crawler iron33 jcrawler jeeves jubii kanoodle kapito kit_fireball kit-fireball ko_yappo_robot kototoi lachesis larbin legs linkwalker lnspiderguy look.com lycos mantraagent markwatch maxbot mercator merzscope meshexplorer metacrawler mirago mnogosearch moget motor muscatferret nameprotect nationaldirectory naverrobot nazilla ncsa beta netnose netresearchserver ng/1.0 northerlights npbot nttdirectory_robot nutchorg nzexplorer odp openbot openfind osis-project overture perlcrawler phpdig pjspide polybot pompos poppi portalb psbot quepasacreep rabot raven rhcs robi robocrawl robozilla roverbot scooter scrubby search.ch search.com.ua searchfeed searchspider searchuk seventwentyfour sidewinder sightquestbot skymob sleek slider_search slurp solbot speedfind speedy spida spider_monkey spiderku stackrambler steeler suchbot suchknecht.at-robot suntek szukacz surferf3 surfnomore surveybot suzuran synobot tarantula teomaagent teradex t-h-u-n-d-e-r-s-t-o-n-e tigersuche topiclink toutatis tracerlock turnitinbot tutorgig uaportal uasearch.kiev.ua uksearcher ultraseek unitek vagabondo verygoodsearch vivisimo voilabot voyager vscooter w3index w3c_validator wapspider wdg_validator webcrawler webmasterresourcesdirectory webmoose websearchbench webspinne whatuseek whizbanglab winona wire wotbox wscbot www.webwombat.com.au xenu link sleuth xyro yahoobot yahoo! slurp yandex yellopet-spider zao/0 zealbot zippy zyborg

as long as you don't do cloaking like this, you're ok.

Dimitar Christoff
A: 

Loop through the array of words with foreach and check if the current word exists in the UA string using strpos():

foreach ($words as $word) {
    if (strpos($row['user_agent'], $word) !== FALSE) {
        // word exists in string
    }
}
yjerem