views:

84

answers:

3

Hi,

I think the question is pretty simple, do I need all the rest of the stuff in Unicode after the basic plane? What kind of stuff is included and is that really needed? (and for what purposes?)

Thanks.

+2  A: 

See the complete list of character charts.

The supplementary characters currently contain ancient scripts. Unless you have an application that should handle ancient scripts such as Kharoshthi, Old Persion and Cuneiform, then probably not.

I guess you will only have to deal with this issue if you encounter a UTF-8 or UTF-16 implementation that is not complete. Some implementations of UTF-8 do not support 4-byte characters, which is the supplementary plane: the characters above U+10000. MySQL comes to mind.

molf
Actually I am using MySQL and that is the reason I am asking. I was wondering whether to use Binary or UTF-8 tables. Thanks.
rFactor
+1  A: 

It depends on whether you control your data or not. If you are using Unicode data from anyone other than yourself, you generally must assume that it may include supplementary characters, which in turn means you need to deal with 4-byte UTF-8, UTF-16 surrogate characters, and so on.

Avi
+2  A: 

If you intend to sell anything in China, then the GB 18030 standard is mandatory, and requires characters beyond BMP (Basic Multilingual Plane). The standard is enforced, and in order to sell there you need to pass a GB 18030 certification.

There also national standards in Japan, and Hong Kong that require characters beyond BMP. Even if these standards are not enforced like the Chinese one, supporting them might give you some edge.

So the simple answer would be: you need some of the stuff there.

Mihai Nita
Beyond certification problem, the Unihan IICore set of character, which contains all the Han (Chinese, Japanese, Korean) characters of current modern usage contains 62 characters in the Supplementary Ideographic Plane (SIP)
Frédéric Grosshans