Suppose I type line = line.decode('gb18030;)
and get the error
UnicodeDecodeError: 'gb18030' codec can't decode bytes in position 142-143: illegal multibyte sequence
Is there a nice way to automatically get the error bytes? That is, is there a way to get 142
& 143
or line[142:144]
from a built-in command or module? Since I'm fairly confident that there will be only one such error, at most, per line, my first thought was along the lines of:
for i in range(len(line)):
try:
line[i].decode('gb18030')
except UnicodeDecodeError:
error = i
I don't know how to say this correctly, but gb18030 has variable byte length so this method fails once it gets to a Chinese character (2 bytes).