views:

2615

answers:

3

[Please note that this is a different question from the already answered How to replace a column using Python’s built-in .csv writer module?]

I need to do a find and replace (specific to one column of URLs) in a huge Excel .csv file. Since I'm in the beginning stages of trying to teach myself a scripting language, I figured I'd try to implement the solution in python.

I'm having trouble when I try to write back to a .csv file after making a change to the contents of an entry. I've read the official csv module documentation about how to use the writer, but there isn't an example that covers this case. Specifically, I am trying to get the read, replace, and write operations accomplished in one loop. However, one cannot use the same 'row' reference in both the for loop's argument and as the parameter for writer.writerow(). So, once I've made the change in the for loop, how should I write back to the file?

edit: I implemented the suggestions from S. Lott and Jimmy, still the same result

edit #2: I added the "rb" and "wb" to the open() functions, per S. Lott's suggestion

import csv

#filename = 'C:/Documents and Settings/username/My Documents/PALTemplateData.xls'

csvfile = open("PALTemplateData.csv","rb")
csvout = open("PALTemplateDataOUT.csv","wb")
reader = csv.reader(csvfile)
writer = csv.writer(csvout)

changed = 0;

for row in reader:
    row[-1] = row[-1].replace('/?', '?')
    writer.writerow(row)                  #this is the line that's causing issues
    changed=changed+1

print('Total URLs changed:', changed)

edit: For your reference, this is the new full traceback from the interpreter:

Traceback (most recent call last):
  File "C:\Documents and Settings\g41092\My Documents\palScript.py", line 13, in <module>
    for row in reader:
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
A: 

the problem is you're trying to write to the same file you're reading from. write to a different file and then rename it after deleting the original.

Jimmy
+1  A: 

You cannot read and write the same file.

source = open("PALTemplateData.csv","rb")
reader = csv.reader(source , dialect)

target = open("AnotherFile.csv","wb")
writer = csv.writer(target , dialect)

The normal approach to ALL file manipulation is to create a modified COPY of the original file. Don't try to update files in place. It's just a bad plan.


Edit

In the lines

source = open("PALTemplateData.csv","rb")

target = open("AnotherFile.csv","wb")

The "rb" and "wb" are absolutely required. Every time you ignore those, you open the file for reading in the wrong format.

You must use "rb" to read a .CSV file. There is no choice with Python 2.x. With Python 3.x, you can omit this, but use "r" explicitly to it clear.

You must use "wb" to write a .CSV file. There is no choice with Python 2.x. With Python 3.x, you must use "w".


Edit

It appears you are using Python3. You'll need to drop the "b" from "rb" and "wb".

Read this: http://docs.python.org/3.0/library/functions.html#open

S.Lott
Alright, I took care of that issue, too. It looks like we're getting closer... The traceback got shorter :)
ignorantslut
Now that I think about it, though, before I attempted to write back to the file (ie, when I was still working on finding the right column in the .csv), the script worked just fine without the rb.
ignorantslut
"The "rb" and "wb" are absolutely required.": not in Python 3. There, you should call open() with newline=''.
Miles
Removing the 'b' from "rb" and "wb" did the trick... Thank you everyone!
ignorantslut
Comments are nice, but Accept an Answer is the way SO works.
S.Lott
sorry S. Lott, I got your check mark right here
ignorantslut
+1  A: 

Opening csv files as binary is just wrong. CSV are normal text files so You need to open them with

source = open("PALTemplateData.csv","r")
target = open("AnotherFile.csv","w")

The error

_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)

comes because You are opening them in binary mode.

When I was opening excel csv's with python, I used something like:

try:    # checking if file exists
    f = csv.reader(open(filepath, "r", encoding="cp1250"), delimiter=";", quotechar='"')
except IOError:
    f = []

for record in f:
    # do something with record

and it worked rather fast (I was opening two about 10MB each csv files, though I did this with python 2.6, not the 3.0 version).

There are few working modules for working with excel csv files from within python - pyExcelerator is one of them.

zeroDivisible
(1) The advice on file opening is quite incorrect. For Python 2.x, use "rb" or "wb" as appropriate. For Python 3.x for reading, specify newline=''(2) I don't understand "There are few working modules for working with excel csv files from within python". There is one, the csv module. What is the relevance of pyExcelerator??
John Machin
John, I am for 100% sure, that when I was working with csv files with python 2.6, I opened them in text, not in binary mode. I opened tried to open one now in python 2.6.2 - it worked like a charm. English is my second language so my posts / comments usually sound like they were written by drunk teenanger. What I meant with pyExcelerator, is that there are few modules designed specifically for working with csv files - I didn't written, that this modules are included in standard distribution. Cheers, Mate.
zeroDivisible
(1) 2.x opening in text mode will not work properly if there are newlines embedded in your data. The manual says to open in binary mode. Just do it! (2) pyExcelerator neither reads nor writes csv files; again I ask: What is the relevance of pyExcelerator???
John Machin