



Hi all, my troubles with ConfigParser continue. It seems it doesn't support Unicode very well. The config file is indeed saved as UTF-8, but when ConfigParser reads it it seems to be encoded into something else. I assumed it was latin-1 and I thougt overriding optionxform could help:

-- configfile.cfg -- 
Häjsan = 3
☃ = my snowman

-- --
# -*- coding: utf-8 -*-  
import ConfigParser

def _optionxform(s):
        newstr = s.decode('latin-1')
        newstr = newstr.encode('utf-8')
        return newstr
    except Exception, e:
        print e

cfg = ConfigParser.ConfigParser()
cfg.optionxform = _optionxform"myconfig")

Of course, when I read the config I get:

'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

I've tried a couple of different variations of decoding 's' but the point seems moot, since it really should be a unicode object from the beginning. After all, the config file is UTF-8? I have confirmed that's something is wrong in the way ConfigParser reads the file by stubbing it out with this DummyConfig class. If I use that then everything is nice unicode, fine and dandy.

-- --
# -*- coding: utf-8 -*-                
apa = {'rules': [(u'Häjsan', 3), (u'☃', u'my snowman')]}

class DummyConfig(object):
    def sections(self):
        return apa.keys()
    def items(self, section):
       return apa[section]
    def add_section(self, apa):
    def set(self, *args):

Any ideas what could be causing this or suggestions of other config modules that supports Unicode better are most welcome. I don't want to use sys.setdefaultencoding()!

The ConfigParser.readfp() method can take a file object, have you tried opening the file object with the correct encoding using the codecs module before sending it to ConfigParser like below:

cfg.readfp("myconfig", "r", "utf8"))
Actually it was ConfigParser.readfp() but it worked!
