views:

152

answers:

2

Hi,

we have a big community website build in drupal, where the site has a mandatory agecheck before you can access the content of the website

it checks for a cookie to be present, if not, you get redirected to the agecheck page.

now we believe crawlers get stuck on this part, they get redirected to the agecheck and never get to crawl the full website.

has anyone had this before? what would be the best way to deal with something like this?

Sander

EDIT

i am sorry only to mention this now, one of the issues with crawlers is also that when someone in the community posts something to his wall on facebook, facebook crawls the page back to fetch images and description (which are specified in meta tags) but facebook gets also redirected to the agecheck page. would a useragentcheck work if i add the facebook crawler ? if so: would anyone know the facebook crawlers exact name then?

The solution below is one that we also came a cross on the net. if adding the facebook crawler to that list works then it would solve all the problems we are having with this agecheck page.

+1  A: 

You could check the user-agent, and if it's a crawler you do not check if the browser/user has the required cookie.

Here is a sample:

function crawlerDetect($USER_AGENT)
{
    $crawlers_agents = ‘Google|msnbot|Rambler|Yahoo|AbachoBOT|accoona|AcioRobot|ASPSeek|CocoCrawler|Dumbot|FAST-WebCrawler|GeonaBot|Gigabot|Lycos|MSRBOT|Scooter|AltaVista|IDBot|eStyle|Scrubby’;

    if ( strpos($crawlers_agents , $USER_AGENT) === false )
       return false;
}

// example

$crawler = crawlerDetect($_SERVER[’HTTP_USER_AGENT’]);

if ($crawler )
{
   // it is crawler, it’s name in $crawler variable
}
else
{
   // usual visitor
}
Espo
would adding the facebook crawler help with the facebook issue? (see my edit)
Sander
Yes, this will also fix your problem with the facebook-link. To find out what user-agent facebook uses, you can log all requests that fail the cookie-test to a database or file, and then try to post a link on facebook. You will then find the user-agent string in your database.
Espo
A: 

Gary Keith has a php class that you can use to check all the attributes of a visitor (eg, browser or crawler), and the class also automatically updates an exhaustive ini file of browsers & crawlers on a regular basis. There's also a drupal module, although I haven't tried it.

Heather Gaye
thanks, defenatly going to look into that issue
Sander