views:

63

answers:

2

Lets say I'm designing a website where we have English, French, Spanish, German, and Korea (I'm not, but lets pretend I am).

I cannot rely upon services such as google translate, as the nature of the website is not for entertainment but business. Lets say I have access to professional translators that can translate something in context to another language and give me that text.

What are some known and simple ways to serve up content over multiple languages with a website?

There's lots of options, such as separate pages, using a database, and so forth... but I can't really decide what's best, how the concept would scale, what needs to be considered, and how to deal with missing translations?

Are there any well-established practices for this?

+4  A: 

The broad topic you're asking about is called "Internationalization and Localization" (or I18N and L10N for short). The important thing to remember is that it's not just about translating messages. There are many other things that go into internationalizing a website.

The more obvious things you will need are:

  • A character encoding that works for characters in all languages, not just English (This means everything down to the database should use UTF encoding)
  • Some way of representing the user's Locale (ie: Java's Locale class)
  • A common pattern for producing a message in that user's locale (ie: Spring's MessageSource

Other things you need to consider:

  • Properly sorting strings based on Locale
  • Formatting date based on locale
  • Always showing times in the user's time zone
  • Showing distance measurements for the user's locale (ie: Miles versus Kilometers?)
  • Laying out the website in right-to-left for languages like Hebrew
  • Think about how you pluralize your messages. String message = "Please fix the following error" + (errors.size() > 1 ? "s" : ""); just doesn't work in an internationalized program.
  • Think about how to lay out your web pages when the length of text may vary wildly.. and never assume that a character is more-or-less a certain width (a single character in Kanji might be 8 times wider than a lower case 'i')

The best resource I can find for this is the ICU library's User guide. If you use Java, this is the library to use.

Hopefully this answer is a helpful start!

Michael D
A: 

We have a set of files on disk that contain all the strings in a given widget/module/whatever, and separate files per language, i.e.:

foo.strings == generic (happens to be US english)
foo.fr.strings == french
foo.fr-CA.strings == canadian french 
foo.en-CA.strings == canadian english

Based on the client's Accept-Language header, we determine which language he wants.

When a given language is first requested, we hit the file system to build up the big string mapping for that language, then cache it in memory. If a given string isn't defined in fr-CA, we'll hop up the stack to fr, then eventually to the generic

Pages are generated dynamically and the generated version of each url is cached depending on the user's language headers (among other things).

Hope that helps

Mike Ruhlin