views:

63

answers:

4

I am working on a website which loads its data via AJAX. I also want that the whole website can be crawled by search engines like google and yahoo. I want to make 2 versions of the site... [1] When a user comes the hyperlinks should work just like GMAIL (#'ed hyperlinks) [2] When a crawler comes the hyperlinks should work normally (AJAX mode off)

How can i identify a Crawler??

A: 

The http headers of the crawler should contain a User-Agent field. You can check this field on your server.

Here is a list of TONS of User-Agents. Some examples:

Google robot 66.249.64.XXX ->
Googlebot/2.1 ( http://www.googlebot.com/bot.html)       

Harvest-NG web crawler used by search.yahoo.com 
Harvest-NG/1.0.2     
Paul Rubel
Thanks I will search more about this
Abhishek Dilliwal
what about the search engines which will come in future
Abhishek Dilliwal
That's the trickey part, isn't it.
Paul Rubel
A: 

Crawlers can usually be identified with the User-Agent HTTP Header. Look at this page for a list of user agents for crawlers specifically. Some examples are:

Google:

  • Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
  • Googlebot/2.1 (+http://www.googlebot.com/bot.html)
  • Googlebot/2.1 (+http://www.google.com/bot.html)

Also, here are some examples for getting the user agent string in various languages:

PHP:
$_SERVER['HTTP_USER_AGENT']

Python Django:
request.META["HTTP_USER_AGENT"]

Ruby On Rails:
request.env["HTTP_USER_AGENT"]

...
Mike Axiak
Thanks for the answers I will search more onto it.. but as Brian said it may impact the search engine ranking :(
Abhishek Dilliwal
A: 

You should not present a different form of your website to your users and a crawler. If Google discovers you doing that, they may reduce your search ranking because of it. Also, if you have a version that's only for a crawler, it may break without you noticing, thus giving search engines bad data.

What I'd recommend is building a version of your site that doesn't require AJAX, and having prominent links on each page to the non-AJAX version. This will also help users who may not like the AJAX version, or who have browser which aren't capable of handling it properly.

Brian Campbell
what if I give a option for registered users as use the AJAX version?
Abhishek Dilliwal
A: 

This approach just makes life difficult for you. It requires you to maintain two completely separate versions of the site and try to guess what version to serve to any given user. Search engines are not the only user agents that don't have JavaScript available and enabled.

Follow the principles of unobtrusive JavaScript and build on things that work. This avoids the need to determine which version to give to a user since the JS can gracefully fail while leaving a working HTML version.

David Dorward
I understand this my plan is to do it the simple HTML way...Just for an enhancement as the page is loaded in users browser and if the user has JS capability the URL will be changed in ajax form like...from (abc.com?var=xyx) to (abc#var=xyz)so i will make the traditional version as well as the ajax based...now as I have realized the cons i will rethink about itThanks..
Abhishek Dilliwal