views:

87

answers:

4

Hi.. I have a hex string and i want to convert it utf8 to insert mysql. (my database is utf8)

hex_string = 'kitap ara\xfet\xfdrmas\xfd'
..
..
..
result='kitap araştırması'

How can i do that? Best regards.

+2  A: 

Assuming Python 2.6,

>>> print('kitap ara\xfet\xfdrmas\xfd'.decode('iso-8859-9'))
kitap araştırması
>>> 'kitap ara\xfet\xfdrmas\xfd'.decode('iso-8859-9').encode('utf-8')
'kitap ara\xc5\x9ft\xc4\xb1rmas\xc4\xb1'
KennyTM
Thank you very much, this is working. But is there any way to do that without knowing source encoding?
@user: See http://stackoverflow.com/questions/1715772/best-way-to-decode-unknown-unicoding-encoding-in-python-2-5.
KennyTM
A: 

String literals explains how to use UTF8 strings in Python source.

Sjoerd
+1  A: 

Try

hex_string.decode("cp1254").encode("utf-8")

(cp1254 or iso-8859-9 are the Turkish codepages, the former being the usual name on Windows platforms, but in Python, both work equally well)

Tim Pietzcker
+1  A: 

First you need to decode it from the encoded bytes you have. That appears to be ISO-8859-9 (latin-5), or, if you are using Windows, probably code page 1254, which is based on latin-5.

>>> 'kitap ara\xfet\xfdrmas\xfd'.decode('cp1254')
u'kitap ara\u015ft\u0131rmas\u0131' # u'kitap araştırması'

If you are using Windows, then depending on where you are getting those bytes, it might be more appropriate to decode them as mbcs, which translates to ‘whichever code page the local system is using’. If the string is just sitting in a .py file, you would be better off just writing u'kitap araştırması' in the source and setting a -*- coding declaration to direct Python to decode it. See PEP 263.

As to how to encode unicode strings to UTF-8 for the database, well, if you want to you can do it manually:

>>> u'kitap ara\u015ft\u0131rmas\u0131'.encode('utf-8')
'kitap ara\xc5\x9ft\xc4\xb1rmas\xc4\xb1'

but a good data access layer is likely to do that automatically for you, if you've got the COLLATION of the tables the data is going into right.

bobince