I am trying to figure out a 'proper' way of sorting UTF-8 strings in Ruby on Rails.
In my application, I have a select box that is populated with countries. As my application is localized, each existing locale has a countries.yml file that relates a country's id to the localized name for that country. I can't sort the strings manually in the yml file because I need the ID to be consistent across all locales.
What I have done is create a ascii_name
method which uses the unidecode
gem to convert accented and non-latin characters to their ascii equivalent (for instance, "Afeganistão" would become "Afeganistao"), and then sort on that:
require 'unidecode'
class Country
def ascii_name
Unidecoder.decode(name).gsub("[?]", "").gsub(/`/, "'").strip
end
end
Country.all.sort_by(:&ascii_name)
However, there are obvious issues with this:
- It cannot properly sort non-latin locales, as there may not be a direct analogous latin character.
- It makes no distinction between a letter and all accented forms of that letter (so, for instance, A and Ä become interchangeable)
Does anyone know of a better way that I could sort my strings?