ansaurus

Question

Answer 1

+8 A:

UTF-16 big endian

Delan Azabani 2010-09-21 10:03:54

YES! Awesome, thanks a lot. Now I do *s.decode('utf-16-be')* to decode it and it's all fine.

ionut bizau 2010-09-21 10:12:06

No problem; good luck!

Delan Azabani 2010-09-21 10:13:07

Answer 2

+1 A:

You have UTF-16BE without a BOM. As documented, chardet doesn't grok UTF-nnxE without a BOM.

>>> s = '\x00Q\x00u\x00i\x00c\x00k' #### Note: dropping the spurious `u` prefix
>>> s.decode('utf_16be')
u'Quick'
>>>

chardet is also not smart enough to raise a DontBeSilly exception if you feed it unicode :-)

John Machin 2010-09-21 10:16:55

What encoding looks exactly like ASCII but has NULL bytes before each byte?!