I want to send chinese characters to be translated by an online service, and have the resulting english string returned.
I'm using simple json and urllib for this.
And yes, i am declaring.
# -*- coding: utf-8 -*-
on top of my code.
The thing is, now everything works fine if i feed urllib a string type object, even if that object contains what would be unicode information. My function is called translate.
For example:
stringtest1 = '無與倫比的美麗'
print translate(stringtest1)
results in the proper translation
and dping
type(stringtest1)
confirms this to be a string object
But if do
stringtest1 = u'無與倫比的美麗'
and try to use my translation function i get this error:
File "C:\Python27\lib\urllib.py", line 1275, in urlencode
v = quote_plus(str(v))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 2-8: ordinal not in range(128)
After researching a bit, it seems this is a common problem: http://www.gossamer-threads.com/lists/python/python/684420 http://bugs.python.org/issue1712522
now, if i type in a script
stringtest1 = '無與倫比的美麗'
stringtest2 = u'無與倫比的美麗'
print 'stringtest1',stringtest1
print 'stringtest2',stringtest2
excution of it returns:
stringtest1 無與倫比的美麗
stringtest2 無與倫比的美麗
but just typing the variables in the console:
>>> stringtest1
'\xe7\x84\xa1\xe8\x88\x87\xe5\x80\xab\xe6\xaf\x94\xe7\x9a\x84\xe7\xbe\x8e\xe9\xba\x97'
>>> stringtest2
u'\u7121\u8207\u502b\u6bd4\u7684\u7f8e\u9e97'
gets me that
My problem is that i don't control how the information to be translated comes to my function. And it seems i have to bring it in the unicode form, which is not accepted by the function.
So, how do i convert one thing into the other?
i've read this http://stackoverflow.com/questions/1207457/convert-unicode-to-string-in-python-containing-extra-symbols
but this is not what i'm after. Urllib accepts the string object but not the unicode object, both containing the same information
Well, at least in the eyes of the web application i'm sending the unchanged information to, i'm not sure if they're are still equivalent things in python.