ansaurus

Question

Convert unicode codepoint to UTF8 hex in python

Answer 1

+2 A:

Use the built-in function unichr() to convert the number to character, then encode that:

>>> unichr(int('fd9b', 16)).encode('utf-8')
'\xef\xb6\x9b'

This is the string itself. If you want the string as ASCII hex, you'd need to walk through and convert each character c to hex, using hex(ord(c)) or similar.

unwind 2009-05-15 10:18:55

The output is not as specified by the question. Anyway, if the OP is happy…

ΤΖΩΤΖΙΟΥ 2009-05-15 19:55:47

Answer 2

+1 A:

Python 2.6.2 (r262:71600, Apr 16 2009, 09:17:39) 
[GCC 4.0.1 (Apple Computer, Inc. build 5250)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> u'\uFD9B'.encode('utf-8')
'\xef\xb6\x9b'
>>> s = 'FD9B'
>>> i = int(s, 16)
>>> i
64923
>>> unichr(i)
u'\ufd9b'
>>> _.encode('utf-8')
'\xef\xb6\x9b'

Virgil Dupras 2009-05-15 10:20:48

Answer 3

A:

data_from_file='\uFD9B'
unicode(data_from_file,"unicode_escape").encode("utf8")

pixelbeat 2009-05-15 15:05:03

Answer 4

A:

If the input string length is a multiple of 4 (i.e. your unicode code points are UCS-2 encoded), then try this:

import struct

def unihex2utf8hex(arg):
    count= len(arg)//4
    uniarr= struct.unpack('!%dH' % count, arg.decode('hex'))
    return u''.join(map(unichr, uniarr)).encode('utf-8').encode('hex')

>>> unihex2utf8hex('fd9b')
'efb69b'

ΤΖΩΤΖΙΟΥ 2009-05-15 19:54:55

ansaurus

tags:

views:

answers:

Convert unicode codepoint to UTF8 hex in python

related questions