ansaurus

Question

How to make Popen() understand UTF-8 properly?

Answer 1

+3 A:

Have you tried decoding your string, and then combining your UTF-8 strings together? In Python 2.4+ (at least), this can be achieved with

result = [x.decode('utf8') for x in proc.stdout.readlines()]

The important point is that your lines x are sequences of bytes that must be interpreted as representing characters. The decode() method performs this interpretation (here, the bytes are assumed to be in the UTF-8 encoding): x.decode('utf8') is of type unicode, which you can think of as "string of characters" (which is different from "string of numbers between 0 and 255 [bytes]").

EOL 2010-10-13 20:26:07

@EOL Many thanks, works fine now. But I have Python 2.4 :)

Vincenzo 2010-10-24 16:29:25

@Vincenzo: Thanks. How did you do it? through the codecs module?

EOL 2010-10-24 20:25:42

@EOL I just changed the line exactly as you suggested. And it works.

Vincenzo 2010-10-25 12:52:18

@Vincenzo: Thank you for the info. I updated the answer so as to reflect the fact that it works with Python 2.4: I was not sure of when `.decode()` had been introduced.

EOL 2010-10-25 15:23:09

ansaurus

tags:

views:

answers:

How to make Popen() understand UTF-8 properly?

related questions