tags:

views:

122

answers:

1

Hello,

I'm using OpenGL and I need to pass to a function array of bytes.

glCallLists(len('text'), GL_UNSIGNED_BYTES, 'text');

This way it's working fine. But I need to pass unicode text. I think that it should work like this:

text = u'unicode text'
glCallLists(len(text), GL_UNSIGNED_SHORT, convert_to_array_of_words(text));

Here I use GL_UNSIGNED_SHORT that says I'll give array where each element takes 2 bytes, and somehow convert unicode text to array of words.

So, how can I convert unicode string to "raw" array of chars' numbers?

+2  A: 

The UTF encoding that takes up 2 bytes per character is UTF-16:

print repr(u'あいうえお'.encode('utf-16be'))
print repr(u'あいうえお'.encode('utf-16le'))
Ignacio Vazquez-Abrams
Cool! utf-16le works for me, Thanks!
race1
Note: not all code points can be represented in UTF-16.
Mike Graham
Yes they can. However, it was inaccurate for me to say that it uses 2 bytes per character. Some will take up 4 bytes, being composed of a "surrogate pair".
Ignacio Vazquez-Abrams
@Mike: I think you mean to say not all code points can be represented in UCS-2.
John Knoeller