views:

30

answers:

2

I'm in the process of localizing a website. I was going to go the way of setting a cookie to the preferred language, and then display the respective language. And, if no cookie was set it would use the preferred language header, as set by the user's browser - and if the header was not set then it would default to English.

So - how does Google's bot work? Will it crawl all websites once each with a different language set in the headers so that it can get each version of the website, or does it even set headers? If not, then do I have to restructure all this so as to use a URL based language selector (www.domain.com/en/page.html www.domain.com/fr/page.html).

+1  A: 

As far as i know Google does not consume cookies. Up until recently it also didnt consume javascripts, but theyve started to do that now, although I cannot say how well it works (probably not well). About the only thing the do consume is text and hyperlinks, apart from flash (which it only gets text and links from too).

My feeling is that the following are used:

1) TLD/Subdomain (regex to determine language from subdomain)

2) HTTP Header for "Content-Language"

3) Language checking (they do have a translator, so must be able to do this)

4) Inbound links from other TLDs

5) Webmaster tools - its possible to set location in there

Probably the best way is to use an amalgamation of all these things and use some kind of scoring system to determine which documents (pages/domains) are language based, but it didnt really work great until Google got people using webmastertools.

One thing to bare in mind is that most of the traffic on the net goes to a handful of websites, so if you can cover these off manually maybe it might make life easier.

Cheers Ke

Ke
+1  A: 

Short answer: No. Engines do not like cookies. Feed them with HTML (and be sure all your languages are getting hyperlinked by the HTML)

Eduardo Molteni