Internationalisation - character set to support all languages?

+2 A:

UTF-8 should be your first choice.

arno 2009-10-07 18:43:38

UTF-8 is an *encoding*, not a character set. Unicode is the character set.

Greg Hewgill 2009-10-07 18:49:54

Sure enough, but then again, “character set” is often used mistakenly instead of “encoding”. See what it's called in HTTP!

Arthur Reutenauer 2009-10-07 19:13:32

Yes, you're right. I didn't know there's a difference because the terms are usually used synonymously but it seems this is because most people don't know the difference. ;-) Sorry.

arno 2009-10-08 06:19:47

+8 A:

Unicode. It has several encodings: UTF-8, UTF-16 and UTF-32.

From http://en.wikipedia.org/wiki/UTF-8

UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode. It is able to represent any character in the Unicode standard, yet is backwards compatible with ASCII.

S.Lott 2009-10-07 18:44:06

Can UTF-8 encode all of Unicode characters?

Roman Plášil 2009-10-07 19:03:20

Yes. It's optimized for Western European languages/ASCII compatibility but can represent any valid Unicode character.

DaveE 2009-10-07 19:11:55

+1 A:

As others have said, UTF-8. Go read Joel's blog post about Unicode and you'll understand why.

Esko 2009-10-07 18:48:51

+1 for the link which is helpful

Roman Plášil 2009-10-07 19:03:56

ansaurus

tags:

views:

answers:

Internationalisation - character set to support all languages?

related questions