views:

287

answers:

4

I'd like to internationalize my site such that it's accessible in many languages. The language setting will be detected in the request data automatically, and can be overridden in the user's settings / stored in the session.

My question pertains to how I should display the various versions of the same page based upon language in terms of the pages' URL's. Let's say we're just looking at the index page of http://www.example.com/, which defaults to English. Now if a French-speaker loads the index page, should I simply keep the URL as http://www.example.com/, or should I have it redirect to http://www.example.com/fr/?

I'm trying to figure out what benefits or consequences this has in terms of SEO. I don't want the French version of the site showing up in google.com if it prevents the English version of the same pages from showing up there, but I would like it to show up in google.fr.

+1  A: 

Since the pages will have different content, I would provide a different URL for the different languages.

The major search engines are smart enough to figure out the language of a page given its contents. Having a language code in the URL (like fr) should also provide the search engines with a hint.

Ben S
Plus, you should set the `lang` attribute on the `<html>` tag. But you knew that :). http://tlt.its.psu.edu/suggestions/international/web/tips/langtag.html
JSBangs
+7  A: 

Hey Matt,

There are a lot of things to consider from a search standpoint when you start localizing your website into multiple languages. Generally, you want to watch out and make sure that you're not being too smart with the user's intentions. Things like auto-detecting language and storing them in cookies can be good in some scenarios, but if they become a requirement for your localizations to work correctly than you can run into some issues with search engines (and real people too).

For search engines, you'll want to make sure that they can find and access all of your content in all the different languages without POST requests (no drop down forms), javascript, flash or cookies. Because search engines generally don't use these technologies.

It turns out that this is often good for real customers as well. If you rely on browser settings or ip detection, than some of your real customers who are either borrowing a friends computer, or traveling in a foreign country might get stuck in the wrong language (Microsoft Bing actually had this problem for a while).

Here's some best practices to keep in mind

  • Each language should be contained under some root in your information architecture. Best option would be to acquire the TLD (mysite.fr) for each specific region for your website. Although this sometimes isn't feasible, so a second option is to use a sub-domain (fr.mysite.com), and the third option is to use a sub folder (mysite.com/fr). That makes it easiest for us to look at a set of pages in aggregate and best determine a language/ region. Don't make it a parameter (mysite.com/products/iphone/lang=en&region=us), that is the most difficult case for us to detect.

  • We have language classifiers (artificial intelligence nets) that try to determine what language/ region a page is describing. So make sure you have enough clues on your page as to what the language is. E.g. if the page is french, make sure the meta description tag is also in french, as are the <h1> tags, the title and make sure you have a solid couple sentences in french. Many sites will mix languages and have very little actual french on the page

  • Telephone numbers, mailing addresses and the name of the geographic location are also great clues for search engines in identifying region/ language of a page. Use these well (and make sure they are actual text on the page, not images)

  • Use Google Webmaster Tools to specify the language and region of your pages. Create an account, verify your site, and then you can specify which region and language different parts of your website are targeted for.

Mis-information - the lang attribute, or any language tags you may have heard about are currently not used by any search engine. When we (Microsoft Bing) did an analysis of these last year, the most common 'standard' lang tag people were using only showed up on 0.000125% of pages on the web - not enough to be useful!

Vanessa Fox (she build google's webmaster center, and created the sitemap protocol) wrote a particularly good article recently about how Google thinks about localization, and what that means for site architecture. I recommend checking it out here: http://www.ninebyblue.com/blog/making-geotargeted-content-findable-for-the-right-searchers/

Good Luck, nate

Nathan Buggia
Very elaborate response, I can't tell you how much I appreciate all this! I've been using Google Webmaster Tools, but I never noticed the language settings in there, as I've never needed it. I'll search for it now. I'm also using the CakePHP framework, which has allowed me to prepare all my strings for I18n; they just need the .PO files created, along with the code required to handle the language code sub-domain/sub-folder/whatever I decide to use.
Matt Huggins
A: 

Two answers: how I do it, and how I gather SEOisers think you're supposed to do it.

My site has mostly english and a couple of German pages, and I plan to have more German pages, and possibly some spanish pages. I have the root page be language-agnostic, and have navigation links with German pages (where they exist) beneath their English equivalents, and use urls that are different in English and German (e.g., /services.html vs. /leistungen.html).

This is good user UI and supposedly lousy SEO, since all the different languages are all tangled up without a way for search engines to disentagle them, which may have bad consequences when calculating search quality metrics.

The SEO-right thing is to maintain a distinct hierarchy, possibly of the form www.site.tld/lang/, but better lang.site.tld/, each with a separate sitemap.xml file.

I care more about visitors than search engines, so I will continue to do the Wrong Thing.

Charles Stewart
+1  A: 

This is how I solved the problem on my personal website as an exercise in i18n:

  • When a user arrives at, e.g. brazzy.de/index.php, the site tries to determine the language from cookie (if present) or browser settings (Accept-language header), defaults to English, and does not redirect
  • Every page has links to the different language versions of that page (IMO the most important factor for user convenience, and also makes sure search engines can easily find the different versions).
  • These links lead to e.g. brazzy.de/en/index.php, which is in my case rewritten to brazzy.de/index.php?lang=en - this ensures that search engines see distinct URLs for the different language versions.
  • Visiting such a subdirectory sets the language cookie to that language
  • The pages without a language-specific URL (i.e. where the language depends on client data) use e.g. <link rel="canonical" href="/en/"> to tell the search engine at which language-specific URL that page can be found.
  • Use XML sitemaps to further make sure search engines can find all pages and all different language versions.
Michael Borgwardt