ansaurus

Question

Python csv library with Unicode/UTF-8 support that "just works"

Answer 1

+7 A:

There is the usage of Unicode example already in that doc, why still need to find another one or re-invent the wheel?

import csv

def unicode_csv_reader(unicode_csv_data, dialect=csv.excel, **kwargs):
    # csv.py doesn't do Unicode; encode temporarily as UTF-8:
    csv_reader = csv.reader(utf_8_encoder(unicode_csv_data),
                            dialect=dialect, **kwargs)
    for row in csv_reader:
        # decode UTF-8 back to Unicode, cell by cell:
        yield [unicode(cell, 'utf-8') for cell in row]

def utf_8_encoder(unicode_csv_data):
    for line in unicode_csv_data:
        yield line.encode('utf-8')

S.Mark 2009-12-04 10:41:47

Answer 2

A:

If you want a class the behaves exactly as the csv.reader class, then create a module wrapping S. Mark's code like this:

import csv

def utf_8_encoder(unicode_csv_data):
    for line in unicode_csv_data:
        yield line.encode('utf-8')

class reader(object):        
    def __init__(self, data_iter, dialect=csv.excel, **kwargs):
        # csv.py doesn't do Unicode; encode temporarily as UTF-8:
        self.csv_reader = csv.reader(utf_8_encoder(data_iter), dialect=dialect, **kwargs)

    def next(self):
        # decode UTF-8 back to Unicode, cell by cell:
        row = self.csv_reader.next()
        return [unicode(cell, 'utf-8') for cell in row]

    def __iter__(self):
        return self

innohead 2010-04-13 14:55:14

ansaurus

tags:

views:

answers:

Python csv library with Unicode/UTF-8 support that "just works"

related questions