tags:

views:

197

answers:

2

I'm trying to check the value of extracted data against a csv I already have. It will only loop through the rows of the CSV once, I can only check one value of feed.items(). Is there a value I need to reset somewhere? Is there a better/more efficient way to do this? Thanks.

orig = csv.reader(open("googlel.csv", "rb"), delimiter = ';')
goodrows = []
for feed in gotfeeds:    
   for link,comments in feed.items():
       for row in orig:
           print link
           if link in row[1]:
               row.append(comments)
               goodrows.append(row)
A: 

Making orig a list avoids the need to reset/reparse the csv:

orig = list(csv.reader(open("googlel.csv", "rb"), delimiter = ';'))
unutbu
This will technically work, but will cause the entire CSV file to be loaded into memory. Not a huge problem if the file's small, but this won't scale.
Chris S
@Chris. True. If the csv file is huge, I'd expect him to mention that in the question, but there is certainly room for both our interpretations.
unutbu
I agree with Chris S. We faced the same problem .. eventually our CSV files got up to 5gb a piece. Needless to say, it was a nightmare
dassouki
There is a tradeoff between speed and space. If you have the space, using a list will be faster because there will be no re-parsing of the csv. This is not a matter of which method is better, but rather which method is more appropriate for the OP's situation. Since he didn't explain how big is the CSV, either method might be more appropriate.
unutbu
In my experience, it's better to plan for scalability, then assume it's not necessary. In this case, making it scalable won't slow it down very much either, as the CSV parsing is fairly simple.
Chris S
A: 

You can "reset" the CSV iterator by resetting the read position of the file object.

data = open("googlel.csv", "rb")
orig = csv.reader(data, delimiter = ';')
goodrows = []
for feed in gotfeeds:    
   for link,comments in feed.items():
       data.seek(0)
       for row in orig:
           print link
           if link in row[1]:
               row.append(comments)
               goodrows.append(row)
Chris S
Works great, thanks. My file is small for the near future, but I would hate having to track down why it's slow in a couple months.
matt