views:

397

answers:

2

I am trying to get a unicode version of calendar.month_abbr[6]. If I don't specify an encoding for the locale, I don't know how to convert the string to unicode. The example code below shows my problem:

>>> import locale
>>> import calendar
>>> locale.setlocale(locale.LC_ALL, ("ru_RU"))
'ru_RU'
>>> print repr(calendar.month_abbr[6])
'\xb8\xee\xdd'
>>> print repr(calendar.month_abbr[6].decode("utf8"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.5/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb8 in position 0: unexpected code byte
>>> locale.setlocale(locale.LC_ALL, ("ru_RU", "utf8"))
'ru_RU.UTF8'
>>> print repr(calendar.month_abbr[6])
'\xd0\x98\xd1\x8e\xd0\xbd'
>>> print repr(calendar.month_abbr[6].decode("utf8"))
u'\u0418\u044e\u043d'

Any ideas how to solve this? The solution doesn't have to look like this. Any solution that gives me the abbreviated month name in unicode is fine.

+8  A: 

Change the last line in your code:

>>> print calendar.month_abbr[6].decode("utf8")
Июн

Improperly used repr() hides from you that you already get what you needed.

Also getlocale() can be used to get encoding for current locale:

>>> locale.setlocale(locale.LC_ALL, 'en_US')
'en_US'
>>> locale.getlocale()
('en_US', 'ISO8859-1')

Another modules that might be useful for you:

  • PyICU - a better way for internationalization. While locale produce either initial or inflected form of month name depending on locale database in your OS (so you can't rely on it for such languages like Russian!) and uses some encoding, PyICU has different format specifiers for initial and inflected form (so you can select appropriate in your case) and uses unicode.
  • pytils - a set of tools to work with Russian language, including dates. It has hard-coded month names as workaround for locale limitations.
Denis Otkidach
If the unicode conversion succeeded I should still be able to do a repr on it. So that shouldn't be the problem. Thanks for the links. I will check them out.
Rickard Lindberg
`locale.getlocale()` worked. Thank you.
Rickard Lindberg
A: 

What you need is:

…
myencoding= locale.getpreferredencoding()
print repr(calendar.month_abbr[6].decode(myencoding))
…
ΤΖΩΤΖΙΟΥ
On my machine `locale.getpreferredencoding()` returns utf8. So I still have the same problem.
Rickard Lindberg
It doesn't seem like `locale.getpreferredencoding()` returns the encoding that `month_abbr` names are encoded in.
Rickard Lindberg