How to strip the 8th bit in a KOI8-R encoded character so as to have translit for a Russian letter? In particular, how to make it in Python?
+1
A:
I'm not exactly sure what you want, but if you want to zero the 8th bit, it can be done like this:
character = character & ~(1 << 7)
Constantin
2009-06-15 07:06:22
mhawke
2009-06-15 07:18:42
+1
A:
Here is one way:
import array
mask = ~(1 << 7)
def convert(koistring):
bytes = array.array('B', koistring)
for i in range(len(bytes)):
bytes[i] &= mask
return bytes.tostring()
test = u'Русский Текст'.encode('koi8-r')
print convert(test) # rUSSKIJ tEKST
I don't know if Python provides a cleaner way to do this kind of operations :)
NicDumZ
2009-06-15 06:59:51
+2
A:
Assuming s is a KOI8-R encoded string you could try this:
>>> s = u'Код Обмена Информацией, 8 бит'.encode('koi8-r')
>>> s
>>> '\xeb\xcf\xc4 \xef\xc2\xcd\xc5\xce\xc1 \xe9\xce\xc6\xcf\xd2\xcd\xc1\xc3\xc9\xc5\xca, 8 \xc2\xc9\xd4'
>>> print ''.join([chr(ord(c) & 0x7F) for c in s])
>>> kOD oBMENA iNFORMACIEJ, 8 BIT
The 8th bit is stripped by the (ord(c) & 0x7F)
.
mhawke
2009-06-15 07:05:12
NicDumZ
2009-06-15 07:11:40