ansaurus

Question

Removing non-ascii characters from any given stringtype in Python

Answer 1

+3 A:

Why did the decode("ascii") give out a unicode string?

Because that's what decode is for: it decodes byte strings like your ASCII one into unicode.

In your second example, you're trying to "decode" a string which is already unicode, which has no effect. To print it to your terminal, though, Python must encode it as your default encoding, which is ASCII - but because you haven't done that step explicitly and therefore haven't specified the 'ignore' parameter, it raises the error that it can't encode the non-ASCII characters.

The trick to all of this is remembering that decode takes an encoded bytestring and converts it to Unicode, and encode does the reverse. It might be easier if you understand that Unicode is not an encoding.

Daniel Roseman 2010-09-08 13:25:03

Well, you are right, except for some details. Since he can print `'a\xf5'` correctly, his terminals encoding is not ascii but .. something else. The console encoding is a really common problem, but it's not the case this time. Also, `teststringUni.decode("ascii" , "ignore")` does not fail when you try to print the result. It tells Python that teststringUni is a ascii encoded string (it is clearly unicode, but Python trusts the user) and tries to decode it - which cannot work ofc.

THC4k 2010-09-08 14:06:49

yes, i think that is the problem: What is my terminal encoding? Just because an object type is string it does not mean the encoding is ascii, i understood that. My problem now is to figure out how i can translate something that has type unicode into the the string type of the terminal, while retaining all information.

Fullmooninu 2010-09-08 15:03:14

Answer 2

+2 A:

It's simple: .encode converts Unicode objects into strings, and .decode converts strings into Unicode.

Ned Batchelder 2010-09-08 13:25:39

this perspective actually solved it =) , thank you

Fullmooninu 2010-09-08 15:57:56

ansaurus

tags:

views:

answers:

Removing non-ascii characters from any given stringtype in Python

related questions