views:

63

answers:

1

Hi,

I am having problems with the DictWriter and non-ascii characters. A short version of my problem:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import codecs
import csv

f = codecs.open("test.csv", 'w', 'utf-8')
writer = csv.DictWriter(f, ['field1'], delimiter='\t')
writer.writerow({'field1':u'å'.encode('utf-8')})
f.close()

Gives this Traceback:

Traceback (most recent call last):
File "test.py", line 10, in <module>writer.writerow({'field1':u'å'.encode('utf-8')})
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/csv.py", line 124, in writerow
return self.writer.writerow(self._dict_to_list(rowdict))
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/codecs.py", line 638, in write
return self.writer.write(data)
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/codecs.py", line 303, in write data, consumed = self.encode(object, self.errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

I am bit lost as the DictWriter ought to be able to work with UTF-8 from what I have read in the documentation.

+2  A: 

The object you obtain with codecs.open wants a unicode string in its write method -- that's the whole point. csv.DictWriter of course is calling that method with a utf8-encoded byte string instead, whence the exception.

Change f's creation to f = open("test.csv", 'wb') (taking codecs out of the picture) and things should work just fine.

Alex Martelli
This could be regarded as a bug in the `csv` module—even in Python 2.x modules should generally accept both byte and Unicode strings.
Philipp
Thanks a lot, that solved it.
Joel
@Philipp, uh? No idea where your "should" comes from, since essentially NO functions behave the way you say they "should generally" -- everybody's always converting bytes to unicode or VV because of that! -- _and_ what csv accepts is totally irrelevant in this case (it's all about what **codecs** accept, and _their_ purpose is exactly to accept unicode strings!). Overall this makes your comment the weirdest I've seen in ages.
Alex Martelli
"should" here means "I want it" ;-) But I think that many modules do accept both byte and Unicode strings, e.g. `os` or `os.path`. And what is VV?
Philipp
@Philipp, VV="vice versa";-). Most functions that appear to accept both byte and unicode strings actually translate one type to the other (via 'ascii' -- eep;-), though very special ones on some platform or other may offer smarter approaches (but they'll be a tiny minority indeed, since the right way to translate is usually not obvious!-).
Alex Martelli