views:

66

answers:

3

Hi,

Why am I getting this issue? and how do I resolve it?

UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 24: unexpected code byte

Thank you

+1  A: 

Somewhere, perhaps subtly, you are asking Python to turn a stream of bytes into a "string" of characters.

Don't think of a string as "bytes". A string is a list of numbers, each number having an agreed meaning in Unicode. (#65 = Latin Capital A. #19968 = Chinese Character "One"/"First") .

There are many methods of encoding a list of Unicode entities into a stream of bytes. Python is assuming your stream of bytes is the result of a particular such method, called "UTF-8".

However, your stream of bytes has data that does not correspond to that method. Thus the error is raised.

You need to figure out the encoding of the stream of bytes, and tell Python that encoding.

It's important to know if you're using Python 2 or 3, and the code leading up to this exception to see where your bytes came from and what the appropriate way to deal with them is.

If it's from reading a file, you can explicity deal with the bytes read. But you must be sure of the file encoding.

If it's from a string that is part of your source code, then Python is assuming the "wrong thing" about your source files... perhaps $LC_ALL or $LANG needs to be set. This is a good time to firmly understand the concept of encoding, and how text editors choose an encoding to write, and what is standard for your language and operating system.

Joe Koberg
A: 

In addition to what Joe said, chardet is a useful tool to detect encoding of the source data.

spyder
A: 

Somewhere you have a plain string encoded as "Windows-1252" (or "cp1252") containing a "RIGHT SINGLE QUOTATION MARK" (’) instead of an APOSTROPHE ('). This could come from a file you read, or even in a Python source file of yours; you could be running Python 2.x and have a # -*- coding: utf8 -*- line somewhere near the script's beginning, or you could be running Python 3.x.

You don't give enough data; however, somewhere you have a cp1252-encoded string, which you try (explicitly or implicitly) to decode to unicode as utf-8. This won't work.

Give us more info, and we'll try again to help you.

Joe Koberg's answer reminded me of an older answer of mine, which some people have found helpful: Python UnicodeDecodeError - Am I misunderstanding encode?

ΤΖΩΤΖΙΟΥ