views:

224

answers:

1

Hey all! I am using graphviz's dot to generate some svg graphs for a web application. I call dot using Popen:

    p = subprocess.Popen(u'/usr/bin/dot -Kfdp -Tsvg', shell=True,\
    stdin=subprocess.PIPE, stdout=subprocess.PIPE)
    str = u'long-unicode-string-i-want-to-convert'
    (stdout,stderr) = p.communicate(str)

What happends is that the dot program throw errors like:

    Error: not well-formed (invalid token) in line 1 
 ... <tr><td cellpadding="4bgcolor="#EEE8AA"> ...
in label of node n260

That obvious error is most certainly NOT in the input string. In particular, if I save it to str.txt with utf-8 encoding and do

/usr/bin/dot -Kfdp -Tsvg < str.txt > myimg.svg

I get the desired output. The only 'special' thing about str is that it contain characters like the danish øæå.

Right now I have no clue what I should do. The problem may very well be in dot; but it certainly seem to be triggered by Popen being different than using < from the shell, and i have no idea where to begin. Any help or ideas for alternatively calling dot (besides writing all the data to a file and calling that!) would be very appreciated!

+1  A: 

Sounds like you should be doing:

stdout, stderr = p.communicate(str.encode('utf-8'))

(except, of course, that you shouldn't shadow the builtin str.) The unicode type in Python holds unicode data, not UTF-8. If you want UTF-8, you need to explicitly encode it.

On top of that, there's no reason to use shell=True in that snippet, nor is the unicode literal passed to subprocess.Popen a particularly good idea (it just gets encoded to ASCII anyway.) And the backslash at the end is unnecessary -- Python knows the line is continued, because you have an open parenthesis that hasn't been closed yet. So, use:

p = subprocess.Popen(['/usr/bin/dot', '-Kfdp', '-Tsvg'],
    stdin=subprocess.PIPE, stdout=subprocess.PIPE)
Thomas Wouters
Thanks Thomas! That did the trick. Boy do I feel stupid, I spend 5 hours 'debugging' a trivial unicode problem :-(. Also thanks for the style hints. Sorry if my crappy code caused physical harm! ;-)
Tue Herlau