views:

235

answers:

5

Does anyone have any code for alphabetizing Arabic and Japanese text that is in Unicode? If the code was in ruby that would be great.

A: 

I don't know Ruby, but python has a function, ord() that translates a unicode special character to its unicode code point. For example,

>>> a = u'ل'
>>> ord(a)
0: 1604
>>> b = u'ع'
>>> ord(b)
1: 1593

Look for something like that in Ruby. I assume that the Arabic symbols are listed in unicode in alphabetic order.

Rob Lourens
Would this help with this question? If we did this to ordinary Latin characters, it'd mean letters would be sorted into being upper or lower case first, which wouldn't make sense in some situations.
Andrew Grimm
Right, if that applies to Arabic and Japanese too, I guess the OP would have to account for that.
Rob Lourens
A: 

To ask the obvious question, what don't you like about mylist.sort?

glenn mcdonald
Does mylist.sort work with Unicode and knows the alphabetical order of the Arabic or Japanese alphabet?
James Testa
...why don't you try it and see?
Rob Lourens
A: 

Depending on your needs words.sort in ruby will be fine for Japanese. The order the characters appear in Unicode are in a reasonably good sorting order. Can't vouch for Arabic though, but my guess is that it's ok as well.

Kimtaro
A: 

mylist.sort should work out of the box in Ruby 1.9 (which has built-in unicode support). In Ruby 1.8, where Unicode support isn't built in, I think you'd have to use the character-encodings gem extend the String class with UTF-8 string comparisions. (And then mylist.sort would work.)

Ken Bloom
+2  A: 

Unicode code points are not listed in alphabetic order (Z < a, for example), but they try to be approximately in that order anyway. There is a canonical unicode order, defined by the Unicode Collation Algorithm and they are also language-specific ordering (french order is not exacly the same as german or czech order, even with the same alphabet), which can be specified in locale information. I think the ICU library contains the language specific algorithms you are looking for.

Frédéric Grosshans