ansaurus

Question

Python IRC bot and encoding issue

Answer 1

+3 A:

chardet should help - it's the canonical Python library for detecting unknown encodings.

RichieHindle 2009-06-02 10:45:28

Trying that now. I'll see where it takes me.

Adi 2009-06-02 10:57:35

Answer 2

A:

Ok, after some research turns out chardet is having troubles with python 3. The solution as it turns out is simpler than I thought. I chose to fall back on CP1252 if UTF-8 doesn't cut it:

data = irc.recv ( 4096 )
try: data = str(data,"UTF-8")
except UnicodeDecodeError: data = str(data,"CP1252")

Which seems to be working. Though it doesn't detect the encoding, and so if somebody came in with an encoding that is neither UTF-8 nor CP1252 I will again have a problem.

This is really just a temporary solution.

Adi 2009-06-02 11:59:04

cp1252 will always appear to work for any non-zero byte sequence, because it assigns a codepoint to every byte value except zero.

RichieHindle 2009-06-02 13:37:01

ansaurus

tags:

views:

answers:

Python IRC bot and encoding issue

related questions