views:

42

answers:

2

I have a Django site that uses the localization middleware in combination with gettext and the trans/blocktrans template tags to show visitors different pages depending on the preferred language in their user agent string (which seems to be the standard way of doing things in Django).

This works great for supported languages (currently only Spanish, English, and German with more coming). If I set the preferred language in my browser to a different language, I get the pages for that translation. However, I have no idea how it appears for search engines.

When a search engine crawls a site, does it typically have a preferred language in its agent string? Will German spiders get the German site and will Spanish ones get the Spanish site, or will they just get the default English site that's displayed when a user has no language set? Does this vary by search engines and is there a "standard way" of doing things that individual crawlers may or may not stick to?

+3  A: 

bots typically do not have accept-language setting in the http header. which means that django will service your default language. Regional search engines can have bots with accept-language set to whatever they prefer, but you cannot rely on that. It is best to have different pages for each language. such as http://your.website.com/english/ and then in your middleware set up a redirect to the right language page if a specific accept-language is present.

guruslan
+1  A: 

Don't rely on what the search engine may do in this regard. You want all versions to be crawled. To achieve that:

  • Have different URLs for the different language versions.
  • Make sure the search engines can find the different versions.

Overall, I believe that the way I did it on my homepage is close to ideal in regard to both search engines and regular users:

  • When a user arrives at, e.g. brazzy.de/index.php, the site tries to determine the language from cookie (if present) or browser settings (Accept-language header), defaults to English, and does not redirect
  • Every page has links to the different language versions of that page (IMO the most important factor for user convenience, and also makes sure search engines can easily find the different versions).
  • These links lead to e.g. brazzy.de/en/index.php, which is in my case rewritten to brazzy.de/index.php?lang=en - this ensures that search engines see distinct URLs for the different language versions.
  • Visiting such a subdirectory sets the language cookie to that language
  • The pages without a language-specific URL (i.e. where the language depends on client data) use e.g. <link rel="canonical" href="/en/"> to tell the search engine at which language-specific URL that page can be found.
  • Use XML sitemaps to further make sure search engines can find all pages and all different language versions.
Michael Borgwardt