ansaurus

Question

Answer 1

A:

Would it be possible to add an Order or ID attribute to your countries.yml that way you can sort manually and still preserve a common identifier?

Jason Sperske 2009-06-11 19:05:24

I suppose so, but that's not the solution I'm looking for, because it'd require a lot of extra manual work. Also, the people who maintain the translations are generally non-technical and thus may not understand what to do with an order attribute.

Daniel Vandersluis 2009-06-11 19:09:51

Answer 2

+1 A:

There are a couple of ways to go. You may want to convert the UTF strings to hex strings and then sort them:

s.split(//).collect { |x| x.unpack('U').to_s }.join

or you may use the library iconv. Read up on it and use it as appropriate (from dzone):

#add this to environment.rb
#call to_iso on any UTF8 string to get a ISO string back
#example : "Cédez le passage aux français".to_iso

class String
  require 'iconv' #this line is not needed in rails !
  def to_iso
    Iconv.conv('ISO-8859-1', 'utf-8', self)
  end
end

Ryan Oberoi 2009-06-11 19:15:43

Hm, sorting by the hex value does seem to put my strings in the alphabetical order, but I don't really understand how it's working, can you explain that? Also, it's still sorting Á before A, which seems backwards to me.

Daniel Vandersluis 2009-06-11 19:33:09

Also watch out: Unicode sorting depends on the locale! Different countries have a different order in their dictionary.

Rutger Nijlunsing 2009-06-11 19:34:12

Well, converting to hex gives you an ordering that is better understood by sort functions. I would experiment a bit, by using hex values formatted to 2 or 3 decimal places. or even use decimal values for each character. I am not a big UTF user myself, but it appears from Rutger's comments that what you are trying to do does not have an exact answer.

Ryan Oberoi 2009-06-11 19:48:04

@Rutger that's what I'm trying to figure out how to implement, I guess, and is another downfall of my current method (or sorting by character code)

Daniel Vandersluis 2009-06-11 20:15:19

Answer 3

A:

Have you tried accessing the mb_chars method for each of your country strings? mb_chars is a proxy that ActiveSupport adds that defines Unicode safe versions of all the String methods. If the comparator is Unicode-aware then the sorting should work correctly.

mb_chars documentation

John Topley 2009-06-11 20:08:05

The problem with using mb_chars is the same as sorting straight; because in the character set A-Z comes before Ä, accented characters will not sort into the correct location.

Daniel Vandersluis 2009-06-11 20:21:13

Answer 4

+3 A:

http://github.com/grosser/sort_alphabetical/tree/master

maybe this plugin may help

İ. Emre Kutlu 2009-08-05 13:43:19

Thanks, that was exactly the sort of plugin I was looking for!

Daniel Vandersluis 2009-08-05 18:08:19

This plugin relies on NFD decomposition http://en.wikipedia.org/wiki/Unicode_equivalence#Normal_forms and fails in some cases. Not all diacritic characters can be decomposed this way (for example Polish letter Ł can not).

skalee 2010-09-09 11:18:50

Answer 5

A:

What you are trying to do is a very messy proposition. There is no way to do transparent transliteration on all Unicode characters because the meaning of digraphs changes from locale to locale, and strings can grow HUGE (if say you replace 10 Chinese symbols with theyr phonetic equivalents). Don't go there.

Why do you want transliterated names in the first place? For URLs? Browsers handle Unicode URLs decently now, so you are inventing a huge problem out of thin air. If you need IDs, preprocess your lists to include a stable numeric ID per country and use that as an identifier. Or save the English name of the country as identitifer (you can download locale-aware ISO country lists for free).

If you truly want good transliteration for Unicode (and this is not what you want in this case) see the IBM ICU libraries, there is a dormant gem for them.

Julik 2009-08-11 15:58:08

Answer 6

+1 A:

The only working solution I found so far (at least for Ruby 1.8 because Ruby 1.9 should handle Unicode better) is Unicode by Yoshida Masato. You can find Unicode.strcmp method there.

EDIT: Sorry, this solution uses NFD decomposition as well with all its limitations.

skalee 2010-09-09 11:16:34

ansaurus

tags:

views:

answers:

Sorting UTF-8 strings in RoR

related questions