views:

205

answers:

3

I am somewhat new to unicode and unicode strings. I'm trying to determine the difference between "fullwidth" symbol and a normal one.

Take these two for example:

Normal: http://www.fileformat.info/info/unicode/char/20a9/index.htm

Fullwidth: http://www.fileformat.info/info/unicode/char/ffe6/index.htm

I notice that the fullwidth is defined as U+20A9 and coincidentally 20A9 is the normal one. So what is the value of U?

When using libraries like ICU is there a way to specify always return normal versus full?

Thanks,

A: 

The 'U' in "U+2049" just denotes that "2049" is a Unicode code point, the value of the Won character in the Unicode codespace. It's a notation used in the Unicode Standard. The "U+" shall be followed by a hexadecimal number, using at least 4 digits, such as "U+1234" or "U+10FFFD".

Johann Gerell
Gotcha. Thanks! But ICU when formatting a currency symbol for the ko_KR (South Korean) locale, generates a FFE6 (fullwidth). I want it to come back with a 20A9.
Travis
+2  A: 

U+number is a notational convention for a Unicode code point. There is no 'value' of U.

U+0020, for example, is a space. The value in memory is 32 decimal, 20 hex.

Full width characters are a whole other story.

Back in the days of the 3270, Hanzi took up two positions in memory in the display. So they also took up two columns on the screen. To make things line up neatly, IBM defined a set of 'full-width' (better would have been 'double-width') letters and numbers.

If some ICU API is delivering full-width, you can use the Normalizer to get rid of it. You might also post a ticket to their ticket system, this seems odd.

bmargulies
I am using the NumberFormat class and for the ko_KR (South Korean) locale, generates a FFE6 (fullwidth). I want it to come back with a 20A9.
Travis
I recommend normalizing the output. I don't see an ICU options to control this.
bmargulies
Search the ICU bug database and also CLDR, this may have been already fixed.
Steven R. Loomis
A: 
McDowell