Does anyone have any code for alphabetizing Arabic and Japanese text that is in Unicode? If the code was in ruby that would be great.
I don't know Ruby, but python has a function, ord() that translates a unicode special character to its unicode code point. For example,
>>> a = u'ل'
>>> ord(a)
0: 1604
>>> b = u'ع'
>>> ord(b)
1: 1593
Look for something like that in Ruby. I assume that the Arabic symbols are listed in unicode in alphabetic order.
To ask the obvious question, what don't you like about mylist.sort
?
Depending on your needs words.sort
in ruby will be fine for Japanese. The order the characters appear in Unicode are in a reasonably good sorting order. Can't vouch for Arabic though, but my guess is that it's ok as well.
mylist.sort
should work out of the box in Ruby 1.9 (which has built-in unicode support). In Ruby 1.8, where Unicode support isn't built in, I think you'd have to use the character-encodings
gem extend the String class with UTF-8 string comparisions. (And then mylist.sort
would work.)
Unicode code points are not listed in alphabetic order (Z < a, for example), but they try to be approximately in that order anyway. There is a canonical unicode order, defined by the Unicode Collation Algorithm and they are also language-specific ordering (french order is not exacly the same as german or czech order, even with the same alphabet), which can be specified in locale information. I think the ICU library contains the language specific algorithms you are looking for.