ansaurus

Question

Answer 1

+1 A:

charset encoding detection is very complex.

however, what's your real purpose for this? if you just want to value to be in unicode, simply write

unicode(time.strftime("%b"))

and it should work for all the cases you've mentioned above:

mac os: unicode(unicode) -> unicode
win/eng: unicode(ascii) -> unicode
win/noneng: unicode(some_cp) -> will be converted by local cp -> unicode

Francis 2009-05-01 09:43:31

Answer 2

+5 A:

Strings don't store any encoding information, you just have to specify one when you convert to/from unicode or print to an output device :

import locale
lang, encoding = locale.getdefaultlocale()
mystring = u"blabla"
print mystring.encode(encoding)

UTF-8 is not unicode, it's an encoding of unicode into single byte strings.

The best practice is to work with unicode everywhere on the python side, store your strings with an unicode reversible encoding such as UTF-8, and convert to fancy locales only for user output.

Luper Rouch 2009-05-01 10:19:15

Answer 3

+1 A:

If you have a reasonably long string in an unknown encoding, you can try to guess the encoding, e.g. with the Universal Encoding Detector at http://chardet.feedparser.org/ -- not foolproof of course, but sometimes it guesses right;-). But that won't help much with very short strings.

Alex Martelli 2009-05-01 15:26:23

ansaurus

tags:

views:

answers:

string encodings in python

related questions