tags:

views:

50

answers:

1

This is my code in Python:

[...]
proc = Popen(path, stdin=stdin, stdout=PIPE, stderr=PIPE)
result = [x for x in proc.stdout.readlines()]
result = ''.join(result);

Everything works fine, when it's ASCII. When I'm receiving UTF-8 text in stdout the result is unpredictable. In most cases the output is damaged. What is wrong here?

Btw, maybe this code should be optimized somehow?

+3  A: 

Have you tried decoding your string, and then combining your UTF-8 strings together? In Python 2.4+ (at least), this can be achieved with

result = [x.decode('utf8') for x in proc.stdout.readlines()]

The important point is that your lines x are sequences of bytes that must be interpreted as representing characters. The decode() method performs this interpretation (here, the bytes are assumed to be in the UTF-8 encoding): x.decode('utf8') is of type unicode, which you can think of as "string of characters" (which is different from "string of numbers between 0 and 255 [bytes]").

EOL
@EOL Many thanks, works fine now. But I have Python 2.4 :)
Vincenzo
@Vincenzo: Thanks. How did you do it? through the codecs module?
EOL
@EOL I just changed the line exactly as you suggested. And it works.
Vincenzo
@Vincenzo: Thank you for the info. I updated the answer so as to reflect the fact that it works with Python 2.4: I was not sure of when `.decode()` had been introduced.
EOL