ansaurus

Question

In Unicode, why are there two representations for the Arabic digits?

Answer 1

+1 A:

Well, thy look like this: ١ and ۱, so I assume that it doesn't matter much. My guess would be that they have different Unicode codes for the same numeral depending on it's location. In Arabic, they do the same with letters: they look different when they are the last letter of a word or if they stand alone.

Edit: I just noted that the 4 look different in both sets: ٤ and ۴. I'm quite sure that in the Middle East (Jordan and Egypt), they use the first form (U-0664).

Teun D 2009-11-04 20:48:06

Either that, or the glyphs are slightly off in e.g. Arabic and African Arabic.

Martin Hohenberg 2009-11-04 20:50:18

yeah they are the same, but numbers dont change how they look depending on there location in the word like letters, this is why i am asking , since i need to know which one to use.

Karim 2009-11-04 20:52:13

yeah i noticed that the 4 and 5 and 6 are different. so the first set is the correct one. thanks for noticing that :)

Karim 2009-11-04 20:54:44

I am an Arab, and I live in Jordan. Never seen the second representation of 4 anywhere. If I were you, I'd stick with the set containing the first representation.

Sinan Taifour 2009-11-04 20:56:55

Answer 2

A:

Which one you use is largely irrelevant - but you should make sure to stick with one once you've choosen.

Martin Hohenberg 2009-11-04 20:48:46

i dont think that they created 2 set of the same stuff.i think when they created unicode they did think about everything and didnt do stuff just for fun.

Karim 2009-11-04 20:59:25

Of course the difference matters; what are you implying?

Arthur Reutenauer 2009-11-04 21:04:10

Answer 3

+1 A:

Which code do you prefer for representing the number 4, U+0664 or U+06F4?

(٤ or ۴ )?

To be consistent, let this choice guide which codes you use for 1, 2, and the other duplicate codes.

mobrule 2009-11-04 20:51:21

Answer 4

+10 A:

According to the code charts, U+0660 .. U+0669 are ARABIC-INDIC DIGIT values 0 through 9, while U+06F0 .. U+06F9 are EXTENDED ARABIC-INDIC DIGIT values 0 through 9.

In the Unicode 3.0 book (5.2 is the current version, but these things don't change much once set), the U+066n series of glyphs are marked 'Arabic-Indic digits' and the U+06Fn series of glyphs are marked 'Eastern Arabic-Indic digits (Persian and Urdu)'. It also notes:

U+06F4 - 'different glyphs in Persian and Urdu'
U+06F5 - 'Persian and Urdu share glyph different from Arabic'
U+06F6 - 'Persian glyph different from Arabic'
U+06F7 - 'Urdu glyph different from Arabic'

For comparison:

U+066n: ٠١٢٣٤٥٦٧٨٩
U+06Fn: ۰۱۲۳۴۵۶۷۸۹

Or:

     U+066n    U+06Fn
0      ٠         ۰
1      ١         ۱
2      ٢         ۲
3      ٣         ۳
4      ٤         ۴
5      ٥         ۵
6      ٦         ۶
7      ٧         ۷
8      ٨         ۸
9      ٩         ۹

(Whether you can see any of those, and how clearly they are differentiated may depend on your browser and the fonts installed on your machine as much as anything else. I can see the difference on 4 and 6 clearly; 5 looks much the same in both.)

Based on this information, if you are working with Arabic from the Middle East, use the U+066n series of digits; if you are working with Persian or Urdu, use the U+06Fn series of digits. As a Unicode application, you should accept either set of codes as valid digits (but you might look askance at a sequence that mixed the two sets of digits - or you might just leave well alone).

Jonathan Leffler 2009-11-04 21:01:04

+1: would have made that answer if you hadn't beaten me by 1 minute ;-) It's a pity everybody seems to think that the difference doesn't matter and rush to make ill-advised answers...

Arthur Reutenauer 2009-11-04 21:06:18

+1: learnt something new.

BalusC 2009-11-04 21:34:14

Answer 5

+3 A:

In general you should not hard-code such info in your application.

On Windows you can use GetLocaleInfo with LOCALE_SNATIVEDIGITS. On Mac CFNumberFormatterCopyProperty with kCFNumberFormatterZeroSymbol Or use something like ICU.

There are Arabic countries that don't use the Arabic-Indic digits by default. So there is no direct mapping saying Arabic -> Arabic-Indic digits.

And the user might have changed the defaults in the Control Panel anyway.

Mihai Nita 2009-11-11 07:46:38

ansaurus

tags:

views:

answers:

In Unicode, why are there two representations for the Arabic digits?

related questions