tags:

views:

344

answers:

3

I have a CSV file that i need to rearrange and renecode. I'd like to run

line = line.decode('windows-1250').encode('utf-8')

on each line before it's parsed and split by the CSV reader. Or I'd like iterate over lines myself run the re-encoding and use just single line parsing form CSV library but with the same reader instance.

Is there a way to do that nicely?

+2  A: 

Loop over lines on file can be done this way:

with open('path/to/my/file.csv', 'r') as f:
    for line in f:
        puts line # here You can convert encoding and save lines

But if You want to convert encoding of a whole file You can also call:

$ iconv -f Windows-1250 -t UTF8 < file.csv > file.csv

Edit: So where the problem is?

with open('path/to/my/file.csv', 'r') as f:
    for line in f:
        line = line.decode('windows-1250').encode('utf-8')
        elements = line.split(",")
Dejw
I do not want to read/write the file twice. The iconv solution is lame, I want it done in code no by some tool, I need to crate a tool that will prepare files in one process not instructions to do that.
WooYek
Again, no support for CSV parsing at the same time, line splitting just won't cut it.
WooYek
+1  A: 

Thx, for the answers. The wrapping one gave me an idea:

def reencode(file):
    for line in file:
        yield line.decode('windows-1250').encode('utf-8')

csv_writer = csv.writer(open(outfilepath,'w'), delimiter=',',quotechar='"', quoting=csv.QUOTE_MINIMAL)
csv_reader = csv.reader(reencode(open(filepath)), delimiter=";",quotechar='"')
for c in csv_reader:
    l = # rearange columns here
    csv_writer.writerow(l)

That's exactly what i was going for re-encoding a line just before it's get parsed by the csv_reader.

WooYek
+1  A: 

At the very bottom of the csv documentation is a set of classes (UnicodeReader and UnicodeWriter) that implements Unicode support for csv:

rfile = open('input.csv')
wfile = open('output.csv','w')
csv_reader = UnicodeReader(rfile,encoding='windows-1250')
csv_writer = UnicodeWriter(wfile,encoding='utf-8')
for c in csv_reader:
    # process Unicode lines
    csv_writer.writerow(c)
rfile.close()
wfile.close()
Mark Tolonen