I was reading the specification of Unicode @ Wikipedia link text and I see that each of the Arabic digits has 2 Unicode codepoints. For example 1 is defined as U+0661 and as U+06F1
Which one should I use?
I was reading the specification of Unicode @ Wikipedia link text and I see that each of the Arabic digits has 2 Unicode codepoints. For example 1 is defined as U+0661 and as U+06F1
Which one should I use?
Well, thy look like this: ١ and ۱, so I assume that it doesn't matter much. My guess would be that they have different Unicode codes for the same numeral depending on it's location. In Arabic, they do the same with letters: they look different when they are the last letter of a word or if they stand alone.
Edit: I just noted that the 4 look different in both sets: ٤ and ۴. I'm quite sure that in the Middle East (Jordan and Egypt), they use the first form (U-0664).
Which one you use is largely irrelevant - but you should make sure to stick with one once you've choosen.
Which code do you prefer for representing the number 4, U+0664 or U+06F4?
(٤ or ۴ )?
To be consistent, let this choice guide which codes you use for 1, 2, and the other duplicate codes.
According to the code charts, U+0660 .. U+0669 are ARABIC-INDIC DIGIT values 0 through 9, while U+06F0 .. U+06F9 are EXTENDED ARABIC-INDIC DIGIT values 0 through 9.
In the Unicode 3.0 book (5.2 is the current version, but these things don't change much once set), the U+066n series of glyphs are marked 'Arabic-Indic digits' and the U+06Fn series of glyphs are marked 'Eastern Arabic-Indic digits (Persian and Urdu)'. It also notes:
For comparison:
Or:
U+066n U+06Fn
0 ٠ ۰
1 ١ ۱
2 ٢ ۲
3 ٣ ۳
4 ٤ ۴
5 ٥ ۵
6 ٦ ۶
7 ٧ ۷
8 ٨ ۸
9 ٩ ۹
(Whether you can see any of those, and how clearly they are differentiated may depend on your browser and the fonts installed on your machine as much as anything else. I can see the difference on 4 and 6 clearly; 5 looks much the same in both.)
Based on this information, if you are working with Arabic from the Middle East, use the U+066n series of digits; if you are working with Persian or Urdu, use the U+06Fn series of digits. As a Unicode application, you should accept either set of codes as valid digits (but you might look askance at a sequence that mixed the two sets of digits - or you might just leave well alone).
In general you should not hard-code such info in your application.
On Windows you can use GetLocaleInfo with LOCALE_SNATIVEDIGITS. On Mac CFNumberFormatterCopyProperty with kCFNumberFormatterZeroSymbol Or use something like ICU.
There are Arabic countries that don't use the Arabic-Indic digits by default. So there is no direct mapping saying Arabic -> Arabic-Indic digits.
And the user might have changed the defaults in the Control Panel anyway.