views:

1007

answers:

3

I need to do a find and replace (specific to one column of URLs) in a huge Excel .csv file. Since I'm in the beginning stages of trying to teach myself a scripting language, I figured I'd try to implement the solution in python.

I'm having trouble with the "replace" part of the solution. I've read the official csv module documentation about how to use the writer, but there isn't really a clear enough example for me (yes, I'm slow). So, now for the question: how does one iterate through the rows of a csv file with a writer object?

p.s. apologies in advance for the clumsy code, I'm still learning :)

import csv

csvfile = open("PALTemplateData.csv")
csvout = open("PALTemplateDataOUT.csv")
dialect = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)
writer = csv.writer(csvout, dialect)

total=0;
needchange=0;
changed = 0;
temp = ''
changeList = []

for row in reader:
    total=total+1
    temp = row[len(row)-1]
    if '/?' in temp:
        needchange=needchange+1;
        changeList.append(row.index)

for row in writer:           #this doesn't compile, hence the question
    if row.index in changeList:
        changed=changed+1
        temp = row[len(row)-1]
        temp.replace('/?', '?')
        row[len(row)-1] = temp
        writer.writerow(row)

print('Total URLs:', total)
print('Total URLs to change:', needchange)
print('Total URLs changed:', changed)
+5  A: 

The reason you're getting an error is that the writer doesn't have data to iterate over. You're supposed to give it the data - presumably, you'd have some sort of list or generator that produces the rows to write out.

I'd suggest just combining the two loops, like so:

for row in reader:
    row[-1] = row[-1].replace('/?', '?')
    writer.writerow(row)

And with that, you don't even need total, needchange, and changeList. (There are a bunch of optimizations in there that I unfortunately don't have time to explain, but I'll see if I can edit that info in later)

David Zaslavsky
Keep in mind that doing this will overwrite your output file as you go. This is the typical way of doing this kind of thing though. Starting with two copies of the file, like you have above, isn't the best practice.
Sean Cavanagh
+1: For that matter, you generally don't need the sniffer or the seek(0), either.
S.Lott
That certainly is an elegant solution, however, I don't think you can use the same 'row' reference in both the for loop iteration and as the parameter of writerow(). The interpreter says:"io.UnsupportedOperation: BufferedReader.write() not supported"
ignorantslut
@ignorantslut: write() certainly is not supported. `writerow()`, however, should be supported. Check your code to be sure you spelled the method name correctly.
S.Lott
@S. Lott: I copied and pasted directly from David's example. Full traceback: File "C:\Documents and Settings\g41092\My Documents\palScript.py", line 21, in <module> writer.writerow(row) File "C:\Program Files\Python\lib\io.py", line 1495, in write self.buffer.write(b) File "C:\Program Files\Python\lib\io.py", line 701, in write self._unsupported("write") File "C:\Program Files\Python\lib\io.py", line 322, in _unsupported (self.__class__.__name__, name))io.UnsupportedOperation: BufferedReader.write() not supported
ignorantslut
@ignorantslut: Then close this question and open a new one. Your new problem has no relationship with this problem. This problem is a fundamental confusion on how to read one file while writing another. Your new error requires us to look at your new code.
S.Lott
+1  A: 

You should only have one loop and read and write at the same time - if your replacements only affect one line at a time, you don't need to loop over the data twice.

for row in reader:
  total=total+1
  temp = row[len(row)-1]
  if '/?' in temp:
    temp = row[len(row)-1]
    temp.replace('/?', '?')
    row[len(row)-1] = temp
  writer.writerow(row)

This is just to illustrate the loop, not sure if the replacement code will work like this.

knabar
A: 

Once you have your csv in a big list, one easy way to replace a column in a list would be to transpose your matrix, replace the row, and then transpose it back:

mydata = [[1, 'a', 10], [2, 'b', 20], [3, 'c', 30]]

def transpose(matrix):
    return [[matrix[x][y] for x in range(len(matrix))] for y in range(len(matrix[0]))]

transposedData = transpose(mydata)
print transposedData
>>> [[1, 2, 3], ['a', 'b', 'c'], [10, 20, 30]]

editedData = transposedData[:2] + [50,70,90]
print editedData
>>> [[1, 2, 3], ['a', 'b', 'c'], [50, 70, 90]]

mydata = transpose(editedData)
print mydata
>>> [[1, 'a', 50], [2, 'b', 70], [3, 'c', 90]]
Greg