views:

382

answers:

3

Hi, I have some data like this:

data1 = ['Agos', '30490349304']
data2 = ['Desir\xc3\xa9','9839483948']

I'm using an API that expects the data encoded in base64, so what I do is:

data = data1
string = base64.b64encode("Hi, %s! Your code is %s" % (data[0], data[0]))
myXMLRPCCall(string)

Which works fine with data1. With data2 the encoding goes ok, but then the XMLRPC returns an error, since it expects (from the API docs) only ISO-8859-1 (Latin1) characters.
My question is: how can I transform my string into Latin1 so that the API accepts it?

+1  A: 

First make sure you're not confused about encodings, etc. Read, for example, this.

Then notice that the main problem isn't with the base64 encoding, but with the fact that you're trying to put byte string (normal string in Python 2.x) inside a Unicode string. I believe you can fix this by removing the "u" from the last string in your example code.

Amnon
Thanks for the quick reply! That was a stupid mistake on my part. I changed that, and now the API says I should have used only ISO-8859-1 characters; I updated the question accordingly.
Agos
You're welcome. But now you made all the previous answers irrelevant to the question.
Amnon
Yes, I'm sorry about that, answers were just too fast!+1 one for the useful link
Agos
+1  A: 
base64.b64encode("Hi, %s! Your code is %s" % (data[0].decode('utf8').encode('latin1'), data[0]))
ʞɔıu
This seems to work (also: duh for me). Another sub-question: it seems that accented characters should also be combined (instead of two entities like the example above).The accepted accented characters (ISO-8859-1 DEC) are 232, 233, 236, 242, 224.How can I convert accented characters in my string to the corresponding (accepted) values? (also: should I post this as a new question?)
Agos
I believe that the two escaped values refer to two bytes that comprise a single character in utf8 (DEC 233). Recall that utf8 can use 1-4 bytes to represent a character (in contrast to older encodings like latin1 in which 1 character == 1 byte).
ʞɔıu
You're right, in fact it gets escaped correctly to DEC 233. Why the XMLRPC still refuses it (since the manual says these codes are ok) is beyond me, and most importantly beyond the scope of this SO question.
Agos
A: 

This seem to work:

...

data = data2
base64.b64encode("Hi, %s! Your code is %s" % (data[0], data[0]))
# => 'SGksIERlc2lyw6khIFlvdXIgY29kZSBpcyBEZXNpcsOp'

# I can't test the XMLRPC parts, so this is just a hint ..
for_the_wire = base64.b64encode("Hi, %s! Your code is %s" % (data[0], data[0]))
latin_1_encoded = for_the_wire.encode('latin-1')

# send latin_1_encoded over the wire ..

Some python (2.X) unicode readings:

The MYYN