views:

110

answers:

3

How to strip the 8th bit in a KOI8-R encoded character so as to have translit for a Russian letter? In particular, how to make it in Python?

+1  A: 

I'm not exactly sure what you want, but if you want to zero the 8th bit, it can be done like this:

character = character & ~(1 << 7)
Constantin
mhawke
+1  A: 

Here is one way:

import array

mask = ~(1 << 7)

def convert(koistring):
    bytes = array.array('B', koistring)
    for i in range(len(bytes)):
        bytes[i] &= mask

    return bytes.tostring()

test = u'Русский Текст'.encode('koi8-r')
print convert(test) # rUSSKIJ tEKST

I don't know if Python provides a cleaner way to do this kind of operations :)

NicDumZ
+2  A: 

Assuming s is a KOI8-R encoded string you could try this:

>>> s = u'Код Обмена Информацией, 8 бит'.encode('koi8-r')
>>> s
>>> '\xeb\xcf\xc4 \xef\xc2\xcd\xc5\xce\xc1 \xe9\xce\xc6\xcf\xd2\xcd\xc1\xc3\xc9\xc5\xca, 8 \xc2\xc9\xd4'

>>> print ''.join([chr(ord(c) & 0x7F) for c in s])
>>> kOD oBMENA iNFORMACIEJ, 8 BIT

The 8th bit is stripped by the (ord(c) & 0x7F).

mhawke
NicDumZ
Thank you all for your help!