tags:

views:

261

answers:

3

I am using python's csv module to extract data from a csv that is constantly being updated by an external tool. I have run into a problem where when I reach the end of the file I get a StopIteration error, however, I would like the script to continue to loop waiting for more lines to be added by the external tool.

What I came up with so far to do this is:

f = open('file.csv')
csvReader = csv.reader(f, delimiter=',')
while 1:
    try:
        doStuff(csvReader.next())
    except StopIteration:
        depth = f.tell()
        f.close()
        f = open('file.csv')
        f.seek(depth)
        csvReader = csv.reader(f, delimiter=',')

This has the intended functionality but it also seems terrible. Looping after catching the StopIteration is not possible since once StopIteration is thrown, it will throw a StopIteration on every subsequent call to next(). Anyone have any suggestions on how to implement this is in such a way that I don't have to do this silly tell and seeking? Or have a different python module that can easily support this functionality.

A: 

You rarely need to catch StopIteration explicitly. Do this:

for row in csvReader:
    doStuff(row)

As for detecting when new lines are written to the file, you can either popen a tail -f process or write out the Python code for what tail -f does. (It isn't complicated; it basically just stats the file every second to see if it's changed. Here's the C source code of tail.)

EDIT: Disappointingly, popening tail -f doesn't work as I expected in Python 2.x. It seems iterating over the lines of a file is implemented using fread and a largeish buffer, even if the file is supposed to be unbuffered (like when subprocess.py creates the file, passing bufsize=0). But popening tail would be a mildly ugly hack anyway.

Jason Orendorff
+2  A: 

Producer-consumer stuff can get a bit tricky. How about using seek and reading bytes instead? What about using a named pipe?

Heck, why not communicate over a local socket?

Hamish Grubijan
+1: named pipe. Give up on files. Use something simpler and built for this purpose.
S.Lott
+2  A: 

Your problem is not with the CSV reader, but with the file object itself. You may still have to do the crazy gyrations you're doing in your snippet above, but it would be better to create a file object wrapper or subclass that does it for you, and use that with your CSV reader. That keeps the complexity isolated from your csv processing code.

For instance (warning: untested code):

class ReopeningFile(object):
    def __init__(self, filename):
        self.filename = filename
        self.f = open(self.filename)

    def next(self):
        try:
            self.f.next()
        except StopIteration:
            depth = self.f.tell()
            self.f.close()
            self.f = open(self.filename)
            self.f.seek(depth)
            # May need to sleep here to allow more data to come in
            # Also may need a way to signal a real StopIteration
            self.next()

    def __iter__(self):
        return self

Then your main code becomes simpler, as it is freed from having to manage the file reopening (note that you also don't have to restart your csv_reader whenever the file restarts:

import csv
csv_reader = csv.reader(ReopeningFile('data.csv'))
for each in csv_reader:
    process_csv_line(each)
jcdyer