I use Google App Engine and cannot use any C/C++ extension, just pure & pythonic library to do conversion of Unicode/UTF-8 strings to lower/upper case. str.lower() and string.lowercase() don't.
views:
413answers:
1
+5
A:
str
encoded in UTF-8 and unicode
are two different types. Don't use string
, use the appropriate method on the unicode object:
>>> print u'ĉ'.upper()
Ĉ
Decode str
to unicode
before using:
>>> print 'ĉ'.decode('utf-8').upper()
Ĉ
Ignacio Vazquez-Abrams
2010-01-27 09:54:02
+1 Thanks. How can I convert unicode type to UTF-8?
Viet
2010-01-27 09:55:18
`>>> print repr(u'ĉ'.encode('utf-8'))` `'\xc4\x89'`
Ignacio Vazquez-Abrams
2010-01-27 09:57:48
Thanks. Is this applicable to Vietnamese?
Viet
2010-01-27 09:59:09
It should be. It's not hard to test in the interactive interpreter.
Ignacio Vazquez-Abrams
2010-01-27 10:03:24
My code does not work for Russian and Vietnamese. I don't know other languageshttp://oladic.appspot.com/add/ОИЧУНКАЛСhttp://oladic.appspot.com/add/TÌNH%20YÊUhttp://oladic.appspot.com/add/ĉĉĉĉ
Viet
2010-01-27 10:10:27
Finally it worked! Thank you very much! I wish I could vote more!
Viet
2010-01-27 10:11:58
Viet: you probably want to URL-encode unicode characters if you're putting them in a URL (although it's probably easier to just POST them as utf-8, assuming you're using a form to submit them).
Wooble
2010-01-27 15:26:47