views:

1168

answers:

3

I've got a noisy python script that I want to silence by directing its stderr output to /dev/null (using bash BTW).

Like so:

python -u parse.py  1> /tmp/output3.txt 2> /dev/null

but it quickly exits prematurely. Hmm. I can't see the traceback because of course that goes out with stderr. It runs noisily and normally if I don't direct stderr somewhere.

So let's try redirecting it to a file somewhere rather than /dev/null, and take a look at what it's outputting:

python -u parse.py  1> /tmp/output3.txt 2> /tmp/foo || tail /tmp/foo

Traceback (most recent call last):
  File "parse.py", line 79, in <module>
    parseit('pages-articles.xml')
  File "parse.py", line 33, in parseit
    print >>sys.stderr, "bad page title", page_title
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)

So, the stderr that's being generated contains utf8, and for some reason python refuses to print non-ascii when it's being redirected, even though it's being directed to /dev/null (though of course python doesn't know that).

How can I silence the stderr of a python script even though it contains utf8? Is there any way to do it without re-writing every print to stderr in this script?

+3  A: 

When stderr is not redirected, it takes on the encoding of your terminal. This all goes out the door when you redirect it though. You'll need to use sys.stderr.isatty() in order to detect if it's redirected and encode appropriately.

Ignacio Vazquez-Abrams
Actually, sys.stderr.encoding is not defined by an encoding of a terminal which might be changed by a variety of means that could be unknown to python. It is more likely that stderr.encoding is defined by LC_* environment variables or similar.
J.F. Sebastian
Well, the encoding of the terminal is determined by these variables, so the end result is the same.
Ignacio Vazquez-Abrams
@Ignacio Vazquez-Abrams: The issue is complicated e.g., http://bugs.python.org/issue4947
J.F. Sebastian
+1  A: 

You could also just encode the string as ASCII, replacing unicode characters that don't map. Then you don't have to worry about what kind of terminal you have.

asciiTitle = page_title.encode("ascii", "backslashreplace")
print >>sys.stderr, "bad page title", asciiTitle

That replaces the characters that can't be encoded with backslash-escapes, i.e. \xfc. There are some other replace options too, described here:

http://docs.python.org/library/stdtypes.html#str.encode

DNS
+5  A: 

You can silence stderr by binding it to a custom writer:

#!/usr/bin/env python
import codecs, sys

class NullWriter:
    def write(self, *args, **kwargs):
        pass

if len(sys.argv) == 2:
   if sys.argv[1] == '1':
      sys.stderr = NullWriter()
   elif sys.argv[1] == '2':
      #NOTE: sys.stderr.encoding is *read-only* 
      #      therefore the whole stderr should be replaced
      # encode all output using 'utf8'
      sys.stderr = codecs.getwriter('utf8')(sys.stderr)

print >>sys.stderr, u"\u20AC" # euro sign
print "ok"

Example:

$ python silence_stderr.py
Traceback (most recent call last):
  File "silence_stderr.py", line 11, in <module>
    print >>sys.stderr, u"\u20AC"
UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 0: ordinal not in range(128)

Silenced stderr:

$ python silence_stderr.py 1
ok

Encoded stderr:

$ python silence_stderr.py 2
€
ok

NOTE: I've got the above outputs inside emacs therefore to emulate it in a terminal you could do:

$ python ... 2>out.txt
$ cat out.txt

NOTE: Inside Windows console (after chcp 65001 that switch to 'utf-8' and with truetype font (Lucida Console)) I've got strange results:

C:\> python silence_stderr.py 2
Traceback (most recent call last):
  File "silence_stderr.py", line 14, in <module>
    print >>sys.stderr, u"\u20AC" # euro sign
  File "C:\pythonxy\python\lib\codecs.py", line 304, in write
    self.stream.write(data)
IOError: [Errno 13] Permission denied

If the font is not truetype then the exception doesn't raise but the output is wrong.

Perl works for the truetype font:

C:\> perl  -E"say qq(\x{20ac})"
Wide character in print at -e line 1.
€

Redirection works though:

C:\>python silence_stderr.py 2 2>tmp.log
ok
C:\>cat tmp.log
€
cat: write error: Permission denied

re comment

From codecs.getwriter documentation:

Look up the codec for the given encoding and return its StreamWriter class or factory function. Raises a LookupError in case the encoding cannot be found.

An oversimplified view:

class UTF8StreamWriter:
    def __init__(self, writer):
        self.writer = writer
    def write(self, s):
        self.writer.write(s.encode('utf-8'))

sys.stderr = UTF8StreamWriter(sys.stderr)
J.F. Sebastian
really cool... can you explain to me what codecs.getwriter does?
ʞɔıu