views:

160

answers:

5

Should I use ISO 639-1 (2-letter abbreviation) or ISO 639-2 (3 letter abbrv) to store a user's language code? Both are official standards, but which is the de facto standard in the development community? I think ISO 639-1 would be easier to remember, and is probably more popular for that reason, but thats just a guess.

The site I'm building will have a separate site for the US, Brazil, Russia, China, & the UK.

http://en.wikipedia.org/wiki/ISO_639

+1  A: 

I'm no expert, but every site I've ever seen uses ISO 639-1, including the current site I'm working on.

It works for us!

Django Reinhardt
+1 I've never seen 639-2 used in any application. Indeed with the presence of collection codes like "cpe" you could wind up encoding documents that are - in fact - readable by no one. And how many documents in Cree do you really expect?
msw
+2  A: 

I would go with a derivative of ISO 639. Specifically I like to use this: http://en.wikipedia.org/wiki/IETF_language_tag

Ben
A: 

I've only ever seen 2-character language codes in use - so I'd recommend going with them unless your work involves delving into linguistics in some way. If all you're doing is customizing the browsing experience for the world at large, you won't need the extra repertoire offered by 3-character codes.

Jonathan Leffler
+1  A: 

ISO 639-1 Alpha-2 are used pretty much universally.

They are used for example in HTTP content negotiation. If you ever wondered how an international website can automatically show you their homepage in your native language, that's how it works. (Although it's sometimes kinda annoying. I, for example, often get shown the default Apache homepage in German, because the webmaster turned on content negotiation, but only put content for English in.)

Most web browsers use them directly in their settings dialog box.

Most operating systems use them in their settings dialog boxes or configuration files.

Wikipedia uses them in their server names for the different language versions.

In other words: if your users aren't native English speakers, they will probably already have encountered them when configuring their software, because otherwise they wouldn't be able to use their computers.

The other members of the ISO 639 family are mostly of interest to linguists. Unless you expect Jesus Christ himself (ISO 639-2 Alpha-3 code arc) to visit your website, or maybe Klingons (tlh), ISO 639-1 has more languages than you ever can hope to support.

Jörg W Mittag
+5  A: 

Shortly you should use IETF language tags because they are already used for HTTP/HTML/XML and many other technologies. These is based on several standard including ISO-639 collection (yes language, region and culture selection is not so simple to define).

I wrote a more detailed article regarding the proper language code selection and usage. The idea is to use the simplest/shorter ISO-639-1 codes and specify more only for special cases. Inside the article there are codes for ~30 most used languages with explanation regarding why I consider one alternative better than another.

In case you want to skip reading the entire article here is a short list of language codes (not to be confused with country codes): ar, cs, da, de, el, en, en-gb, es, fr, fi, he, hu, it, ja, ko, nb, nl, pl, pt, pt-pt, ro, ru, sv, tr, uk, zh, zh-hant

As you may observer there are some not so obvious remarks:

  • en is used for en-us - American English, and for British English is used en-gb
  • pt is used for pt-br, and not pt-pt witch has much less speakers
  • zh is used instead of zh-hans, zh-CN,...
  • zh-hant (Traditional Chinese) is used instead of more specific codes like zh-hant-TW or zh-TW

You can find more explanations inside the article.

Sorin Sbarnea
John Himmelman