views:

357

answers:

4

I want to sort list of strings with respect to user language preference. I have a multilanguage Python webapp and what is the correct way to sort strings such way?

I know I can set up locale, like this:

import locale
locale.setlocale(locale.LC_ALL, '')

But this should be done on application start (and doc says it is not thread-safe!), is it good idea to set it up in every thread according to current user (request) setting?

I would like something like function locale.strcoll(...) with additional parameter - language that is used for sorting.

A: 

Given the documentation warnings, it seems you are on your own if you try to set locale diffrently in different threads.

If you can split your problem into one thread per locale, might you not as well split it into one subprocess per locale, using Python 2.6's multiprocessing?

It seems everything solving this must be a hack, you could even consider using the command-line program sort (1) invoked with different LC_ALL for different languages.

kaizer.se
+2  A: 

I would recommend pyICU -- Python bindings for IBM's rich open-source ICU internationalization library. You make a Collator object e.g. with:

    collator = PyICU.Collator.createInstance(PyICU.Locale.getFrance())

and then you can sort e.g. a list of utf-8 encoded strings by the rules for French, e.g. by using thelist.sort(cmp=collator.compare).

The only issue I had was that I found no good packaged, immediately usable version of PyICU plus ICU for MacOSX -- I ended up building and installing from sources: ICU's own sources, 3.6, from here -- there are binaries for Windows and several Unix versions there, but not for the Mac; PyICU 0.8.1 from here.

Net of these build/installation issues, and somewhat-scant docs for the Python bindings, ICU's really a godsend if you do any substantial amount of i18n-related work, and PyICU a very serviceable set of bindings to it!

Alex Martelli
I already briefly see pyICU, but I thought that such functionality must be included in python standard library - and it is not probably. I'll try pyICU.
Jiri
locale-switching (and direct support for multiple locales at once) is not a strong suit of the standard library, but fortunately ICU offers a real wealth of i18n tools (pity about PyIcu's dearth of docs and lack of simple all-in-one Mac OS X installer, though).
Alex Martelli
Unfortunately, with pyICU, my app will not run on Google App Engine. I 'll try to find pure python solution.
Jiri
Yep, you're right, pyICU is not among the non-pure-Python libraries App Engine supports. It would help others help you, if you mentioned in your question the constraint of running on App Engine (in the title, in the body, AND as a tag, since it's so crucial!).
Alex Martelli
+1  A: 

You will want the latest possible ICU under your pyICU to get the best and most up to date data.

Steven R. Loomis
A: 

Another possible solution is to use SQL server that has good locale support (unfortunately, sqlite is not an option). Then I can put all data to temporary memory table and SELECT them with ORDER BY. IMO it should be better solution than trying to distribute locale settings to multiple processes as kaizer.se's answer recommends.

Jiri