Hi.. I have a hex string and i want to convert it utf8 to insert mysql. (my database is utf8)
hex_string = 'kitap ara\xfet\xfdrmas\xfd'
..
..
..
result='kitap araştırması'
How can i do that? Best regards.
Hi.. I have a hex string and i want to convert it utf8 to insert mysql. (my database is utf8)
hex_string = 'kitap ara\xfet\xfdrmas\xfd'
..
..
..
result='kitap araştırması'
How can i do that? Best regards.
Assuming Python 2.6,
>>> print('kitap ara\xfet\xfdrmas\xfd'.decode('iso-8859-9'))
kitap araştırması
>>> 'kitap ara\xfet\xfdrmas\xfd'.decode('iso-8859-9').encode('utf-8')
'kitap ara\xc5\x9ft\xc4\xb1rmas\xc4\xb1'
Try
hex_string.decode("cp1254").encode("utf-8")
(cp1254
or iso-8859-9
are the Turkish codepages, the former being the usual name on Windows platforms, but in Python, both work equally well)
First you need to decode it from the encoded bytes you have. That appears to be ISO-8859-9 (latin-5), or, if you are using Windows, probably code page 1254, which is based on latin-5.
>>> 'kitap ara\xfet\xfdrmas\xfd'.decode('cp1254')
u'kitap ara\u015ft\u0131rmas\u0131' # u'kitap araştırması'
If you are using Windows, then depending on where you are getting those bytes, it might be more appropriate to decode them as mbcs
, which translates to ‘whichever code page the local system is using’. If the string is just sitting in a .py
file, you would be better off just writing u'kitap araştırması'
in the source and setting a -*- coding
declaration to direct Python to decode it. See PEP 263.
As to how to encode unicode strings to UTF-8 for the database, well, if you want to you can do it manually:
>>> u'kitap ara\u015ft\u0131rmas\u0131'.encode('utf-8')
'kitap ara\xc5\x9ft\xc4\xb1rmas\xc4\xb1'
but a good data access layer is likely to do that automatically for you, if you've got the COLLATION
of the tables the data is going into right.