views:

381

answers:

3

I'm trying to log a UTF-8 encoded string to a file using Python's logging package. As a toy example:

import logging

def logging_test():
    handler = logging.FileHandler("/home/ted/logfile.txt", "w",
                                  encoding = "UTF-8")
    formatter = logging.Formatter("%(message)s")
    handler.setFormatter(formatter)
    root_logger = logging.getLogger()
    root_logger.addHandler(handler)
    root_logger.setLevel(logging.INFO)

    # This is an o with a hat on it.
    byte_string = '\xc3\xb4'
    unicode_string = unicode("\xc3\xb4", "utf-8")

    print "printed unicode object: %s" % unicode_string

    # Explode
    root_logger.info(unicode_string)

if __name__ == "__main__":
    logging_test()

This explodes with UnicodeDecodeError on the logging.info() call.

At a lower level, Python's logging package is using the codecs package to open the log file, passing in the "UTF-8" argument as the encoding. That's all well and good, but it's trying to write byte strings to the file instead of unicode objects, which explodes. Essentially, Python is doing this:

file_handler.write(unicode_string.encode("UTF-8"))

When it should be doing this:

file_handler.write(unicode_string)

Is this a bug in Python, or am I taking crazy pills? FWIW, this is a stock Python 2.6 installation.

+1  A: 

Try this:

import logging

def logging_test():
    log = open("./logfile.txt", "w")
    handler = logging.StreamHandler(log)
    formatter = logging.Formatter("%(message)s")
    handler.setFormatter(formatter)
    root_logger = logging.getLogger()
    root_logger.addHandler(handler)
    root_logger.setLevel(logging.INFO)

    # This is an o with a hat on it.
    byte_string = '\xc3\xb4'
    unicode_string = unicode("\xc3\xb4", "utf-8")

    print "printed unicode object: %s" % unicode_string

    # Explode
    root_logger.info(unicode_string.encode("utf8", "replace"))


if __name__ == "__main__":
    logging_test()

For what it's worth I was expecting to have to use codecs.open to open the file with utf-8 encoding but either that's the default or something else is going on here, since it works as is like this.

John
+3  A: 
Vinay Sajip
Yes this was it. There was a bug in the python logging package that was fixed in a later version.
Ted Dziuba
I am runningPython 2.6.1 (r261:67515, Feb 11 2010, 00:51:29) [GCC 4.2.1 (Apple Inc. build 5646)] on darwinon my iMac, and I still get the same error. Was the bug really fixed?
Tsf
Vinay Sajip
A: 

If I understood your problem correctly, the same issue should arise on your system when you do just:

str(u'ô')

I guess automatic encoding to the locale encoding on Unix will not work until you have enabled locale-aware if branch in the setencoding function in your site module via locale. This file usually resides in /usr/lib/python2.x, it worth inspecting anyway. AFAIK, locale-aware setencoding is disabled by default (it's true for my Python 2.6 installation).

The choices are:

  • Let the system figure out the right way to encode Unicode strings to bytes or do it in your code (some configuration in site-specific site.py is needed)
  • Encode Unicode strings in your code and output just bytes

See also The Illusive setdefaultencoding by Ian Bicking and related links.

Andrey Vlasovskikh